Files
claude-mnemonic/pkg/models/observation.go
T
lukaszraczylo 5c2685c7b6 feat(leann-phase2): implement hybrid vector storage and graph-based search (#20)
* feat(leann-phase2): implement hybrid vector storage and graph-based search

- [x] Add AST-aware code chunking for Go, Python, and TypeScript using tree-sitter
- [x] Implement LEANN-inspired hybrid vector storage with hub detection and selective embedding storage (60-80% savings)
- [x] Add observation relationship graph with CSR format and edge detection (file overlap, semantic similarity, temporal, concept)
- [x] Implement graph-aware search with two-level traversal and relationship-based ranking
- [x] Add auto-tuning system for dynamic hub threshold adjustment based on query performance
- [x] Add comprehensive metrics tracking for vector storage, queries, latency, and graph traversals
- [x] Update configuration system with graph and hybrid storage settings
- [x] Add graph stats and vector metrics endpoints to worker service
- [x] Enhance UI sidebar with advanced metrics display and graph visualization
- [x] Optimize struct field alignment throughout codebase for memory efficiency
- [x] Update documentation with LEANN Phase 2 features and performance benefits
- [x] Add tree-sitter dependency for AST parsing

* fix: add fts5 build tag to CI workflow

Pass build-tags: "fts5" to shared workflow to properly compile
sqlite-vec-go-bindings with SQLite FTS5 support.

This fixes test failures in hybrid vector storage tests that require
CGO and FTS5 build tags.

Requires shared-actions@8f7f235 or later.

* docs: add testing documentation and macOS ARM64 known issue

Document the macOS ARM64 CGO linking issue with sqlite-vec-go-bindings
that prevents hybrid package tests from compiling locally.

Added:
- .github/TESTING.md: Comprehensive testing guide with platform-specific
  issues, workarounds, and CI configuration details
- internal/vector/hybrid/README.md: Package-specific documentation
  explaining the macOS limitation
- .github/CI_FIX_SUMMARY.md: Technical details of the CI fix

Key points:
- 41 out of 42 packages test successfully on all platforms
- hybrid package tests fail only on macOS ARM64 (local dev issue)
- Linux CI tests pass with proper build-tags: "fts5" configuration
- Production builds and runtime functionality unaffected

This is a known limitation of sqlite-vec-go-bindings on macOS ARM64
and does not impact CI/CD or production deployments.

* fix: add SQLite busy_timeout to prevent database locked errors

Set PRAGMA busy_timeout=5000 (5 seconds) to allow SQLite to retry
when the database is locked instead of failing immediately.

This fixes race conditions when multiple goroutines try to write
simultaneously, particularly in tests where StoreObservation spawns
async cleanup goroutines.

Root cause:
- StoreObservation launches goroutine -> CleanupOldObservations
- Multiple concurrent cleanups caused "database is locked" errors
- Without busy_timeout, SQLite fails immediately on lock contention

Solution:
- Add 5-second busy timeout for automatic retry on lock
- Standard practice for concurrent SQLite usage
- Works with existing WAL mode configuration

Fixes TestObservationStore_CleanupOldObservations in CI.

* docs: complete summary of all CI test fixes

Comprehensive documentation of all fixes applied:
1. Missing build tags (fts5)
2. Database locked errors (busy_timeout)

All 41/42 packages now pass tests. The hybrid package has a known
macOS ARM64 limitation that doesn't affect CI or production.

No functionality was removed - all fixes are additive only.

* fix: add SQLite driver import to hybrid tests for CGO linking

Add blank import of mattn/go-sqlite3 to hybrid test files to ensure
the SQLite driver is linked into the test binary. This provides the
SQLite symbols that sqlite-vec-go-bindings requires.

Root cause:
- hybrid package imports sqlitevec (transitively depends on sqlite-vec CGO)
- Test binary needs SQLite symbols for linking
- sqlitevec tests already had this import, but hybrid tests didn't
- Without the driver import, linker fails with "undefined symbols"

This fix enables hybrid tests to run with -race flag on all platforms.

Before: 41/42 packages pass (hybrid failed to link)
After:  42/42 packages pass 

Fixes hybrid test compilation on macOS ARM64, Linux, and Windows.

* docs: remove outdated macOS limitation documentation

The hybrid test linking issue has been fixed by adding the SQLite
driver import. All tests now pass on all platforms including macOS.

Removed:
- internal/vector/hybrid/README.md (documented workaround no longer needed)
- .github/TESTING.md (macOS limitation section obsolete)

All 42/42 packages now test successfully with -race flag.

* docs: final comprehensive summary of all CI fixes

All three issues now resolved:
1. Missing fts5 build tags
2. Database busy_timeout for concurrent writes
3. Missing SQLite driver import in hybrid tests

Result: 42/42 packages pass with -race on all platforms.

Credit to reviewer for identifying the race detector concern.
2026-01-07 22:03:59 +00:00

335 lines
11 KiB
Go

// Package models contains domain models for claude-mnemonic.
package models
import (
"database/sql"
"database/sql/driver"
"encoding/json"
"fmt"
"time"
)
// ObservationType represents the type of observation.
type ObservationType string
const (
ObsTypeDecision ObservationType = "decision"
ObsTypeBugfix ObservationType = "bugfix"
ObsTypeFeature ObservationType = "feature"
ObsTypeRefactor ObservationType = "refactor"
ObsTypeDiscovery ObservationType = "discovery"
ObsTypeChange ObservationType = "change"
)
// ObservationScope defines the visibility scope of an observation.
type ObservationScope string
const (
// ScopeProject means the observation is only visible within the same project.
ScopeProject ObservationScope = "project"
// ScopeGlobal means the observation is visible across all projects.
// Used for best practices, advanced patterns, and generalizable knowledge.
ScopeGlobal ObservationScope = "global"
)
// GlobalizableConcepts are concept tags that indicate an observation
// should be considered for global scope (best practices, patterns, etc.)
var GlobalizableConcepts = []string{
"best-practice",
"pattern",
"anti-pattern",
"architecture",
"security",
"performance",
"testing",
"debugging",
"workflow",
"tooling",
}
// JSONStringArray is a custom type for handling JSON string arrays in SQLite.
type JSONStringArray []string
// Scan implements sql.Scanner for JSONStringArray.
func (j *JSONStringArray) Scan(src interface{}) error {
if src == nil {
*j = nil
return nil
}
var data []byte
switch v := src.(type) {
case string:
data = []byte(v)
case []byte:
data = v
default:
return fmt.Errorf("JSONStringArray: unsupported type %T", src)
}
if len(data) == 0 {
*j = nil
return nil
}
return json.Unmarshal(data, j)
}
// Value implements driver.Valuer for JSONStringArray.
func (j JSONStringArray) Value() (driver.Value, error) {
if j == nil {
return nil, nil
}
return json.Marshal(j)
}
// JSONInt64Map is a custom type for handling JSON int64 maps in SQLite.
type JSONInt64Map map[string]int64
// Scan implements sql.Scanner for JSONInt64Map.
func (j *JSONInt64Map) Scan(src interface{}) error {
if src == nil {
*j = nil
return nil
}
var data []byte
switch v := src.(type) {
case string:
data = []byte(v)
case []byte:
data = v
default:
return fmt.Errorf("JSONInt64Map: unsupported type %T", src)
}
if len(data) == 0 {
*j = nil
return nil
}
return json.Unmarshal(data, j)
}
// Value implements driver.Valuer for JSONInt64Map.
func (j JSONInt64Map) Value() (driver.Value, error) {
if j == nil {
return nil, nil
}
return json.Marshal(j)
}
// Observation represents a learning extracted from a Claude Code session.
type Observation struct {
FileMtimes JSONInt64Map `db:"file_mtimes" json:"file_mtimes,omitempty"`
SDKSessionID string `db:"sdk_session_id" json:"sdk_session_id"`
Project string `db:"project" json:"project"`
Scope ObservationScope `db:"scope" json:"scope"`
Type ObservationType `db:"type" json:"type"`
CreatedAt string `db:"created_at" json:"created_at"`
Subtitle sql.NullString `db:"subtitle" json:"subtitle,omitempty"`
Title sql.NullString `db:"title" json:"title,omitempty"`
Narrative sql.NullString `db:"narrative" json:"narrative,omitempty"`
Concepts JSONStringArray `db:"concepts" json:"concepts,omitempty"`
FilesRead JSONStringArray `db:"files_read" json:"files_read,omitempty"`
FilesModified JSONStringArray `db:"files_modified" json:"files_modified,omitempty"`
Facts JSONStringArray `db:"facts" json:"facts,omitempty"`
PromptNumber sql.NullInt64 `db:"prompt_number" json:"prompt_number,omitempty"`
LastRetrievedAt sql.NullInt64 `db:"last_retrieved_at_epoch" json:"last_retrieved_at_epoch,omitempty"`
ScoreUpdatedAt sql.NullInt64 `db:"score_updated_at_epoch" json:"score_updated_at_epoch,omitempty"`
DiscoveryTokens int64 `db:"discovery_tokens" json:"discovery_tokens"`
ID int64 `db:"id" json:"id"`
CreatedAtEpoch int64 `db:"created_at_epoch" json:"created_at_epoch"`
ImportanceScore float64 `db:"importance_score" json:"importance_score"`
UserFeedback int `db:"user_feedback" json:"user_feedback"`
RetrievalCount int `db:"retrieval_count" json:"retrieval_count"`
IsStale bool `db:"-" json:"is_stale,omitempty"`
IsSuperseded bool `db:"is_superseded" json:"is_superseded,omitempty"`
}
// ParsedObservation represents an observation parsed from SDK response XML.
type ParsedObservation struct {
FileMtimes map[string]int64
Type ObservationType
Title string
Subtitle string
Narrative string
Scope ObservationScope
Facts []string
Concepts []string
FilesRead []string
FilesModified []string
}
// ToStoredObservation converts a ParsedObservation to the stored Observation format.
// Used for similarity comparison before storage.
func (p *ParsedObservation) ToStoredObservation() *Observation {
return &Observation{
Type: p.Type,
Title: sql.NullString{String: p.Title, Valid: p.Title != ""},
Subtitle: sql.NullString{String: p.Subtitle, Valid: p.Subtitle != ""},
Facts: p.Facts,
Narrative: sql.NullString{String: p.Narrative, Valid: p.Narrative != ""},
Concepts: p.Concepts,
FilesRead: p.FilesRead,
FilesModified: p.FilesModified,
FileMtimes: p.FileMtimes,
}
}
// DetermineScope determines the appropriate scope based on observation concepts.
// Returns ScopeGlobal if any concept matches globalizable patterns, else ScopeProject.
func DetermineScope(concepts []string) ObservationScope {
for _, concept := range concepts {
for _, globalConcept := range GlobalizableConcepts {
if concept == globalConcept {
return ScopeGlobal
}
}
}
return ScopeProject
}
// ObservationJSON is a JSON-friendly representation of Observation.
// It converts sql.NullString to plain strings for clean JSON output.
type ObservationJSON struct {
FileMtimes map[string]int64 `json:"file_mtimes,omitempty"`
Subtitle string `json:"subtitle,omitempty"`
SDKSessionID string `json:"sdk_session_id"`
Scope ObservationScope `json:"scope"`
Type ObservationType `json:"type"`
Title string `json:"title,omitempty"`
CreatedAt string `json:"created_at"`
Narrative string `json:"narrative,omitempty"`
Project string `json:"project"`
Concepts []string `json:"concepts,omitempty"`
Facts []string `json:"facts,omitempty"`
FilesRead []string `json:"files_read,omitempty"`
FilesModified []string `json:"files_modified,omitempty"`
CreatedAtEpoch int64 `json:"created_at_epoch"`
DiscoveryTokens int64 `json:"discovery_tokens"`
ID int64 `json:"id"`
PromptNumber int64 `json:"prompt_number,omitempty"`
ImportanceScore float64 `json:"importance_score"`
UserFeedback int `json:"user_feedback"`
RetrievalCount int `json:"retrieval_count"`
LastRetrievedAt int64 `json:"last_retrieved_at_epoch,omitempty"`
ScoreUpdatedAt int64 `json:"score_updated_at_epoch,omitempty"`
IsStale bool `json:"is_stale,omitempty"`
IsSuperseded bool `json:"is_superseded,omitempty"`
}
// MarshalJSON implements json.Marshaler for Observation.
// Converts sql.NullString fields to plain strings.
func (o *Observation) MarshalJSON() ([]byte, error) {
j := ObservationJSON{
ID: o.ID,
SDKSessionID: o.SDKSessionID,
Project: o.Project,
Scope: o.Scope,
Type: o.Type,
Facts: o.Facts,
Concepts: o.Concepts,
FilesRead: o.FilesRead,
FilesModified: o.FilesModified,
FileMtimes: o.FileMtimes,
DiscoveryTokens: o.DiscoveryTokens,
CreatedAt: o.CreatedAt,
CreatedAtEpoch: o.CreatedAtEpoch,
IsStale: o.IsStale,
// Importance scoring fields
ImportanceScore: o.ImportanceScore,
UserFeedback: o.UserFeedback,
RetrievalCount: o.RetrievalCount,
// Conflict detection fields
IsSuperseded: o.IsSuperseded,
}
if o.Title.Valid {
j.Title = o.Title.String
}
if o.Subtitle.Valid {
j.Subtitle = o.Subtitle.String
}
if o.Narrative.Valid {
j.Narrative = o.Narrative.String
}
if o.PromptNumber.Valid {
j.PromptNumber = o.PromptNumber.Int64
}
if o.LastRetrievedAt.Valid {
j.LastRetrievedAt = o.LastRetrievedAt.Int64
}
if o.ScoreUpdatedAt.Valid {
j.ScoreUpdatedAt = o.ScoreUpdatedAt.Int64
}
return json.Marshal(j)
}
// NewObservation creates a new observation from parsed data.
func NewObservation(sdkSessionID, project string, parsed *ParsedObservation, promptNumber int, discoveryTokens int64) *Observation {
now := time.Now()
// Determine scope: use parsed scope if set, otherwise auto-determine from concepts
scope := parsed.Scope
if scope == "" {
scope = DetermineScope(parsed.Concepts)
}
return &Observation{
SDKSessionID: sdkSessionID,
Project: project,
Scope: scope,
Type: parsed.Type,
Title: sql.NullString{String: parsed.Title, Valid: parsed.Title != ""},
Subtitle: sql.NullString{String: parsed.Subtitle, Valid: parsed.Subtitle != ""},
Facts: parsed.Facts,
Narrative: sql.NullString{String: parsed.Narrative, Valid: parsed.Narrative != ""},
Concepts: parsed.Concepts,
FilesRead: parsed.FilesRead,
FilesModified: parsed.FilesModified,
FileMtimes: parsed.FileMtimes,
PromptNumber: sql.NullInt64{Int64: int64(promptNumber), Valid: promptNumber > 0},
DiscoveryTokens: discoveryTokens,
CreatedAt: now.Format(time.RFC3339),
CreatedAtEpoch: now.UnixMilli(),
// Importance scoring: new observations start with score 1.0
ImportanceScore: 1.0,
UserFeedback: 0,
RetrievalCount: 0,
}
}
// ToMap converts the observation to a map for JSON response building.
// This allows adding extra fields like similarity scores.
func (o *Observation) ToMap() map[string]interface{} {
// Marshal to JSON then unmarshal to map (uses MarshalJSON for proper conversion)
data, err := json.Marshal(o)
if err != nil {
return map[string]interface{}{"id": o.ID, "error": err.Error()}
}
var result map[string]interface{}
if err := json.Unmarshal(data, &result); err != nil {
return map[string]interface{}{"id": o.ID, "error": err.Error()}
}
return result
}
// CheckStaleness checks if an observation is stale based on current file mtimes.
// Returns true if any tracked file has been modified since the observation was created.
func (o *Observation) CheckStaleness(currentMtimes map[string]int64) bool {
if len(o.FileMtimes) == 0 {
return false // No file tracking, assume fresh
}
for path, recordedMtime := range o.FileMtimes {
if currentMtime, exists := currentMtimes[path]; exists {
if currentMtime > recordedMtime {
return true // File was modified since observation was created
}
}
// If file doesn't exist in currentMtimes, it may have been deleted
// We don't mark as stale for missing files - they might just not be checked
}
return false
}