librarian

Architecture

Librarian uses hexagonal architecture so the core can run through a CLI, an API service, tests, or future workers without changing business logic.

Design Rules

Domain code has no framework, database, filesystem, or model-provider imports.
Application services depend on domain entities and ports.
Infrastructure adapters implement ports.
CLI and API adapters are thin request/response translators.
Long-running work is represented as persisted processing runs, not transient function calls.
Every generated output is traceable to source hash, chunk IDs, prompt version, model provider, model name, and pipeline version.

Layers

Adapters
  CLI: Typer
  API: FastAPI
  Mac app: SwiftUI client of the API (apps/macos), embeds the backend in release builds
  Storage: SQLite repository and SQLite-backed content store
  LLM: OpenAI-compatible, mock
  Extraction: txt, md, csv, json, docx, pdf, OCR images

Application
  IngestDocument
  ProcessDocument
  SearchLibrary
  ExportDocument

Domain
  Document
  SourceFile
  Chunk
  ProcessingRun
  CleanedOutput
  Classification
  Taxonomy

Ports
  DocumentRepository
  RunRepository
  ContentStore
  TextExtractor
  LLMProvider
  TaxonomyProvider
  SearchIndex
  EventSink
  RunQueue

RunQueue is the durable job backend port. Queue adapters must support enqueue, claim, heartbeat, complete, fail/retry, cancel, and paginated list operations so CLI/API operator views keep the same shape when SQLite is replaced by a networked backend.

Fast Processing Model

The pipeline is a resumable DAG:

Extract source text.
Normalize text.
Build deterministic chunks.
Clean chunks concurrently where the selected coherence mode allows it.
Validate chunk outputs.
Assemble the document.
Classify and tag.
Index for search.

The default execution model should favor throughput:

async LLM calls
bounded provider concurrency
chunk-result caching by source hash, prompt version, model, and chunk hash
SQLite WAL mode
immediate persistence after each stage
event streaming for CLI/API progress

Coherence Modes

fast: flat parallel chunk cleaning using the configured chunk overlap for boundary context.
balanced: parallel chunk groups with local carry-forward inside each group.
max-coherence: sequential carry-forward across the full document.

The default production mode is balanced, which keeps local context while preserving parallelism.

Content Storage

The content store persists raw text, chunks, cleaned chunks, final outputs, and the FTS mirror in SQLite. This keeps the first release portable and easy to back up, but it duplicates large text payloads. A filesystem or object-store content adapter remains a future option for very large hosted deployments; SQLite is the supported 1.0 backend.

Search goes through the application-layer SearchIndex port. The default adapter is SQLite FTS over cleaned and raw outputs, with snippets, facets, pagination, and filters. Results use BM25 ranking with deterministic created-at and document-ID tie-breakers so pagination is stable. User queries are normalized before MATCH so ordinary punctuation and hyphenated terms behave like word queries instead of exposing raw FTS syntax. Future semantic or hybrid indexes should implement the same port rather than changing API or CLI route code.

Prompt Governance

Prompts live under src/librarian/prompts. Prompt text is versioned and recorded in run metadata. The default cleaning prompt is cmos_v2, which preserves the prototype’s CMOS copy-editing intent while adding explicit instructions for OCR cleanup, structure preservation, context-marker handling, and chunk-local fidelity. cmos_v1 remains bundled so older run provenance and cache keys stay resolvable. Classification prompts are versioned the same way with dewey_v1 and dewey_v2. Startup settings reject prompt versions that are not bundled with the package.

Migrations

SQLite schema changes live in src/librarian/storage/migrations and are applied in filename order. Applied versions are recorded in schema_migrations.

Jobs And Events

The API submits processing work through an application-level job runner instead of FastAPI BackgroundTasks. The default runner is bounded and in-process for local use. Production deployments can set LIBRARIAN_JOB_BACKEND=sqlite and run librarian worker as a separate process. The SQLite queue uses leases, retry backoff, attempt limits, and persisted state so API processes can restart independently of workers.

Run events can be fetched as JSON or streamed over server-sent events.

Benchmarking

The librarian maintainer benchmark command uses deterministic synthetic text and the configured cleaner to measure chunking and cleaning throughput. This is the baseline harness for comparing chunking policies, coherence modes, providers, and concurrency settings.

Evaluation

The librarian maintainer eval command runs JSON eval suites against the configured chunking, prompt, and provider stack. Evals are intentionally file-based so contributors can add sanitized cases without coupling the harness to private corpora. Operational tuning guidance lives in docs/OPERATIONS.md.

This site is open source. Improve this page.