ETR Study API — System design

Overview

This document defines the system design of the ETR Study API: a single, versioned HTTP surface backed by a relational model that supports Extract–Transform–Retrieve study flows, spaced retrieval, and auditable history. It is written for engineers who implement, integrate, or operate the service.

It brings together:

scope and explicit boundaries (what we intentionally leave out);
a methodology traceability matrix and explicit non-goals (what the API does not model);
the review suggestion model—how due work is computed, ordered, and distinguished from client-only habits;
C4-style structural views (context, containers, components);
a relational persistence model with a first-class schedule_policies catalogue, aligned with spaced retrieval;
tabular architecture decisions with alternatives and the selected option;
functional and non-functional requirements, with pointers to OpenAPI and ADRs;
references to sequence diagrams and the internal API catalogue.

The human workflow in Methodology remains the pedagogical source for vocabulary and intent. This document turns that intent into engineering contracts: which rules are enforced by the API, which are defaults carried by SchedulePolicy, and which stay client-side. Authoritative HTTP shapes and operation-level detail live in the internal API hub once OpenAPI matches this design.

Scope and boundaries

The design is scoped to the HTTP API and the relational data model that underpins it. At the boundary we assume a single abstract actor, the integrator: any standards-compliant HTTP client. The deployable unit is modelled as one application process with one primary relational database. Horizontal scaling, connection pooling, and read replicas are deployment concerns once load and SLOs justify them.

Category	In scope	Out of scope
Product	Users, conspectuses, versioned schedule policies, due retrieval state, review logs, learning errors, schedule summaries	End-user UX; specific clients (browser, mobile, bots); offline sync; voice-memo / walk workflows unless explicitly added
Platform	One API process, database, observability hooks	Kubernetes topology, ingress, service mesh, node pools
Security	API key, rate limits, CORS, idempotent writes	Enterprise IdP (e.g. Okta, Azure AD, Google Workspace)

ETR methodology mapping

Level: Bridge to engineering. Methodology describes how humans study; this document assigns each concept to storage, policy, or client behaviour. Rows below name the primary artefact; see Methodology traceability for enforcement level.

Methodology	System artefact	Notes
Three-column cue sheet (Extract)	`conspectuses.cue_sheet` + `cue_sheet_schema_version`	Schema in Cue sheet JSON; validation tier in Content validation.
Transform (dense paragraph, 5–7 bullets)	`dense_paragraph`, `bullets`	Counts and sentence limits are defaults / advisory unless product turns them into hard API validation.
Retrieve — four slots A–D, tags easy / hard / forgot	`conspectus_schedules`, `schedule_policies`, `conspectus_review_logs`	Slot ladder and delays are data-driven via `SchedulePolicy` (see Reference policy).
What to review next (“due” work)	`next_review_at`, due queries, ordering rules	Mechanism in Review suggestion and scheduling logic.
Evening random review from Slot A	—	Not a server obligation: client samples from listed conspectuses (see Non-goals).
“Succeed twice in a row” before advancing	optional future: `review_streak` or session metadata	Not in the baseline schema; SRS tags drive schedule transitions today.
Error log → next Transform pass	`learning_errors`; content still `CONTENT_PATCHED` via events	Remediation workflow (open loops, tasks) is product/UX, not implied by the error row alone.
Out-of-home walk / voice memo loop	—	Out of scope unless a dedicated capture API is added; content still lands in Transform fields.

Methodology traceability matrix

Each methodology idea is classified so implementers do not guess: invariant (must hold in production data), default (shipped policy or OpenAPI default), advisory (documented, soft validation), client (learner or app behaviour only), out of scope.

Methodology element	Class	Where it lives
Separate cues from full notes (anchors, not transcripts)	Advisory	API may warn on payload size; cue sheet shape validates against `cue_sheet_schema_version`.
Three columns: keywords, questions/gaps, hints	Invariant (shape)	`cue_sheet.rows[].keyword`, `.question`, `.hint` for schema v1 (see Cue sheet JSON).
Pause every 5–10 minutes; ~50 min Extract block	Client	Timers and UX; not stored server-side.
Single master conspectus after Transform	Invariant	One schedule row per conspectus; content snapshot on `conspectuses`.
Paragraph ≤ 5 sentences; 5–7 bullets	Advisory → optional invariant	Document as limits; promote to strict validation via ADR if product requires.
Slot A today → B tomorrow → C +3d → D +7d then 14, 30…	Default	`schedule_policies.rules` for the reference policy; other products may ship different policies.
Tag easy / hard / forgot; hard/forgot reset toward A	Invariant (behaviour)	Deterministic transitions in the active `SchedulePolicy` for the row.
Evening random drill	Client	Client queries a pool (e.g. by slot) and samples; API lists, does not randomize for the user.
Error log drives next Transform	Client / product	Server stores errors; prioritising study sessions is UX.

Explicit non-goals (methodology vs API)

To avoid silent mismatch between Methodology and this service, the following are not required behaviours of the HTTP API in the baseline design:

Random “evening review” selection — the API exposes due times and filters; random sampling of older items is a client algorithm.
Consecutive success counts (“twice in a row”) — not represented in conspectus_review_logs; adding it is a schema and product change.
Walk / audio capture pipeline — voice memos and offline capture are out of scope until dedicated endpoints and storage exist.
Automatic remediation scheduling after learning_errors — errors are data; scheduling the next Transform session is UX.
Teaching quality scoring — no NLP or rubric evaluation of paragraph quality.

If product later needs any of the above, add ADRs and extend OpenAPI; do not assume they are implied by methodology text alone.

Review suggestion and scheduling logic

“What should I review now?” splits into server-backed due retrieval (grounded in next_review_at and policy) and client habits (random drills, time-boxing) that the API does not own.

Due set (server)

For a learner, the due conspectuses at instant t (UTC) are those owned by the user where conspectus_schedules.next_review_at <= t, subject to soft-delete or archive flags if introduced later. This is the primary input to any “suggestion” API: temporal ordering, not pedagogical ranking beyond policy.

Calendar “today” (e.g. “due today in my timezone”) is next_review_at interpreted in the user’s IANA timezone (see users.timezone), then compared to the local calendar date. Until timezone is set, due filtering should use UTC boundaries only—documented in API responses (see Risks).

Ordering (stable tie-breakers)

When multiple conspectuses are due, return order should be deterministic:

next_review_at ascending (most overdue first);
then conspectuses.created_at ascending (older content first);
then conspectus_uuid lexicographic (total order for pagination).

Product may additionally boost “new” or “forgotten-heavy” items in the client; the server baseline stays explainable and replayable from timestamps alone.

Applying a review (state transition)

A review is one learner decision captured as tag ∈ { easy, hard, forgot } at time reviewed_at. The server:

Loads the conspectus schedule and resolves the active policy row in schedule_policies (matching schedule_policy_id and algorithm_version on the schedule).
Computes (slot', slot_d_ladder_index', next_review_at') from (slot, slot_d_ladder_index, tag) using immutable rules JSON for that policy version.
Updates conspectus_schedules atomically and appends one row to conspectus_review_logs with schedule_before / schedule_after snapshots and increments schedule_revision (optimistic concurrency—see Persistence).

Idempotency replays the same HTTP request; optimistic concurrency rejects two different reviews racing on the same conspectus—distinct concerns (see Cross-cutting notes).

What the API does not compute

Random evening sample — clients list candidates (e.g. filter by slot = 'A' via summary API) and shuffle locally.
Interleaving subjects — cross-topic ordering is UX.
“Next best” across competing study goals — would require goals and priorities not in the baseline model.

Reference default `SchedulePolicy` (methodology-aligned)

Illustrative. Product may ship different delays; history remains interpretable because each review log stores policy identifiers and snapshots. The table below matches the four-slot cadence in broad strokes.

Field	Example value	Role
`schedule_policy_id`	`etr_methodology_four_slot`	Stable name for the policy family.
`algorithm_version`	`1.0.0`	Bump when transition rules change; never rewrite old log rows.
`rules` (JSON)	Encodes: allowed `slot` values `A`–`D`; for each `(slot, tag)` the next slot, optional `slot_d_ladder_index` for D-tier rungs, and `delay_from_review_at` (e.g. `PT1H` for first retrieval, `P1D`, `P3D`, then D ladder `7d → 14d → 30d → …` for `easy`); `hard` and `forgot` map to reset toward Slot A per product decision (see Forgot vs hard).

New conspectuses receive this policy by default at creation time unless the integrator passes another schedule_policy_id that already exists in schedule_policies. Seeding the reference row is a deployment concern (migration or admin task).

Content validation policy (cue sheet, paragraph, bullets)

Methodology recommends bounds (sentence and bullet counts, lean cues). The API should validate in layers:

Hard — JSON schema for cue_sheet matches cue_sheet_schema_version; request rejected on parse failure.
Soft — optional warnings in logs or response extensions when bullets fall outside 5–7 or paragraph length exceeds guidance (feature-flagged).
None — free text allowed where product does not enable pedagogy mode.

Exact numeric limits belong in OpenAPI once product selects strict vs advisory mode; this document requires only that cue_sheet_schema_version exists before evolving cue_sheet shape.

Problem statement

Goal: a versioned HTTP API that durably supports ETR learning workflows.

Problem: retention collapses when rehearsal and state are not persisted with clear history.

Approach: a transactional service that separates content from schedule state, maintains append-only review and event logs, and exposes stable error semantics to integrators.

Domain model (aggregates and lifecycle)

The following aggregates map to relational tables (see the conspectus ER diagram):

User (users) — tenant boundary. Primary key client_uuid. External identifiers (system_user_id, system_uuid) resolve to this row, consistent with the User API. Optional timezone (IANA string, e.g. Europe/Berlin) supports calendar-day due views; if null, due endpoints document UTC-only semantics.
Conspectus (conspectuses, PK conspectus_uuid) — the canonical note after Transform: cue_sheet, dense_paragraph, bullets, optional title, cue_sheet_schema_version (integer, default 1), ownership via owner_client_uuid, monotonic content_version, and optional fields for large-body or hybrid storage.
SchedulePolicy (schedule_policies) — versioned catalogue of spaced-repetition rules. Composite natural key (schedule_policy_id, algorithm_version); carries immutable rules JSON and metadata. Referenced by conspectus_schedules so transitions stay auditable after policy updates.
ConspectusSchedule (conspectus_schedules, 1:1) — mutable retrieval state only:
- slot — coarse position on the A–D ladder (aligns conceptually with Methodology: Retrieve);
- slot_d_ladder_index — policy-specific sub-step (e.g. rung within the D tier), updated with slot according to the active (schedule_policy_id, algorithm_version) row;
- next_review_at — drives due lists and any scheduling UX;
- schedule_policy_id + algorithm_version — foreign key to schedule_policies; together they select the immutable rules used at review time.
- schedule_revision — monotonic integer incremented on each successful schedule write; clients send expected revision on review to detect concurrent sessions (see Cross-cutting notes).
ConspectusReviewLog (conspectus_review_logs) — append-only: one row per review with outcome tag, reviewed_at, schedule_policy_id + algorithm_version (denormalized from the policy used), and immutable schedule_before / schedule_after JSON for audit. This is the system of record for review outcomes.
ConspectusEvent (conspectus_events) — append-only facts that are not review outcomes: creation, content patches, title changes, and (per D6) manual schedule adjustments such as SCHEDULE_ADJUSTED.
LearningError (learning_errors) — records weak cues or mistakes for remediation; semantically distinct from SRS tags. Optional review_log_id ties an error to a review session when both are captured together. Conflating this with review outcomes requires an explicit product decision and schema discriminant.

Lifecycle (summary): resolve default schedule_policies row → create conspectus, initial schedule (with policy FK + schedule_revision = 1), and a CREATED-class event → query by next_review_at per Review suggestion → review command inserts into conspectus_review_logs and updates conspectus_schedules → content PATCH appends to conspectus_events → learning errors stand alone or commit in the same transaction as a review (with review_log_id when applicable).

C4 decomposition

The service is a modular monolith: FastAPI, one database primary, Redis for distributed runtime concerns (rate limits, idempotency cache, short-lived keys), optional metrics and log pipelines. Regenerate diagrams with make docs-fix.

C4 L1 — **Source:** `docs/uml/architecture/system_context_view.puml`

C4 L2 — **Source:** `docs/uml/architecture/container_view.puml`

C4 L3 — **Source:** `docs/uml/architecture/system_component_view.puml`

Persistence and schema (relational baseline)

The schema separates note content (cue sheet, paragraph, bullets) from schedule state. SchedulePolicy is a first-class catalogue table so schedule_policy_id is never a dangling string. Each review (retrieval outcome) is append-only in conspectus_review_logs, with immutable schedule_before / schedule_after JSON plus schedule_policy_id and algorithm_version copied for audit (matching the policy row used for the transition). Non-review mutations append to conspectus_events (e.g. CREATED, CONTENT_PATCHED, TITLE_CHANGED).

Canonical rule: do not mirror every review into conspectus_events unless an ADR explicitly requires a BI-oriented duplicate—the review log is the source of truth for scheduling history.

Read models (conspectuses, conspectus_schedules) update in the same transaction as the corresponding log insert(s).

Naming: In prose the aggregate is conspectus (singular). Physical tables use plural snake_case—e.g. conspectuses, conspectus_schedules, conspectus_review_logs—as in the ER diagram. Sections may refer to conspectus_schedule logically; the table is conspectus_schedules.

Entity–relationship

Tables, keys, and roles

Table	Keys / FK	Role
`users`	PK `client_uuid`; unique `(system_user_id, system_uuid)`	Identity and tenant boundary. Optional `timezone` for local “due today” semantics (see Review suggestion).
`schedule_policies`	PK `(schedule_policy_id, algorithm_version)`; optional unique `policy_uuid` for external references	Immutable versioned rules: `name`, `rules` (JSON), `created_at`. Seeded with the reference policy; new versions add rows, never mutate history.
`conspectuses`	PK `conspectus_uuid`; FK `owner_client_uuid` → `users`	Content snapshot: `title`, `cue_sheet` (JSON), `cue_sheet_schema_version` (int), `dense_paragraph`, `bullets` (JSON), `content_version`, `created_at`, `updated_at` (see due ordering). For hybrid storage: `body_storage`, `external_document_id`, `content_sha256`, `sync_status` (see Hybrid storage).
`conspectus_schedules`	PK/FK `conspectus_uuid` → `conspectuses` ON DELETE CASCADE; FK `(schedule_policy_id, algorithm_version)` → `schedule_policies`	`slot`, `slot_d_ladder_index`, `next_review_at`, `schedule_revision` (bigint, ≥ 1), `schedule_updated_at`. Optional denormalized `owner_client_uuid` for index-only due queries (space vs join).
`conspectus_review_logs`	PK `id`; FK `conspectus_uuid`	Append-only: `tag`, `reviewed_at`, `schedule_before` / `schedule_after`, `schedule_policy_id`, `algorithm_version` (denormalized from the policy row used for the transition).
`conspectus_events`	PK `id`; FK `conspectus_uuid`	Append-only lifecycle: `event_type`, `payload`, `schema_version`, optional `correlation_id`.
`learning_errors`	PK `error_uuid`; FK `owner_client_uuid`; optional `conspectus_uuid` → `conspectuses` ON DELETE SET NULL; optional `review_log_id` → `conspectus_review_logs`	Pedagogical mistake log—distinct from SRS tags in `conspectus_review_logs`. Use `review_log_id` to correlate with a specific review when both are recorded.
`idempotency_keys`	Unique `(owner_client_uuid, endpoint_path, idempotency_key)` (or equivalent scoped to the authenticated principal)	Deduplication of critical writes; keys must not collide across users.

Indexes (typical queries)

Due workload: composite (owner_client_uuid, next_review_at) on conspectus_schedules if owner_client_uuid is denormalized; otherwise join conspectuses for ownership and index conspectus_schedules(conspectus_uuid, next_review_at).
Conspectus listing: (owner_client_uuid, updated_at DESC) on conspectuses.
Review history: (conspectus_uuid, reviewed_at DESC) on conspectus_review_logs.
Learning errors: (owner_client_uuid, created_at DESC); optional (conspectus_uuid, created_at DESC).
Idempotency: unique scope must include owner_client_uuid (or API principal), not only path + key.
Policy catalogue: (schedule_policy_id, algorithm_version) on schedule_policies is already the primary key.

Schema design notes (fixes vs earlier drafts)

schedule_policies was missing. Referencing schedule_policy_id without a parent row breaks referential integrity and makes migrations non-replayable; the catalogue table is required.
algorithm_version on the schedule. The schedule row must carry both schedule_policy_id and algorithm_version to reference exactly one immutable policy row (composite FK).
schedule_revision. Reviews need optimistic concurrency independent of idempotency keys; bump on every schedule mutation.
cue_sheet_schema_version. Prevents silent JSON drift; pair with migrations when cue_sheet shape changes.
idempotency_keys scope. Global uniqueness on (endpoint_path, idempotency_key) would let two users collide; scope by owner / principal.
learning_errors.conspectus_uuid. Prefer ON DELETE SET NULL so deleting a conspectus does not orphan errors that should remain for analytics, or choose CASCADE if errors must disappear with the note—product decision, documented in migrations.

Cue sheet, bullets, paragraph: SQL JSON vs external document

Default: store cue_sheet, bullets, and dense_paragraph inline (JSON / TEXT). This respects typical body-size limits, keeps backup and transactions straightforward, and allows JSON evolution through migrations and validation.

Cue sheet JSON (schema v1)

For cue_sheet_schema_version = 1, cue_sheet is an object with a rows array. Each row aligns with the three-column mental model in Methodology · ETR at home:

keyword — short anchor (one to three words).
question — question or gap (“What are the three steps of X?”).
hint — brief answer cue (optional on early rows; methodology allows one- to three-word hints).

Validation: reject unknown keys or missing rows when strict mode is on; future schema versions add columns rather than overloading strings. See Content validation policy.

Criterion	Inline in RDBMS	External blob or document store
Size / SLO	Appropriate while under configured maximum body size.	Prefer when payloads grow large or binary attachments appear.
Versioning	`content_version` plus event payloads; migrate JSON with scripts.	Object version / ETag in store; SQL holds pointer and hash.
Full-text search	PostgreSQL FTS or generated columns; SQLite is more constrained.	Often a dedicated search tier (operational cost).

Hybrid SQL + object or document store (optional)

When bodies are externalised, SQL must still anchor: conspectus_uuid, ownership, body_storage (inline vs external), external_document_id, content_version or etag, content_sha256, and optionally sync_status. Viable options span self-hosted S3-compatible storage, managed object tiers with free allowances, or document databases—selection is driven by operational cost, egress, and consistency semantics, not by a single vendor.

Risks and mitigations:

Split writes — use outbox or staged upload, compensating deletes, TTL-based GC for orphaned blobs.
Migrations — dual-write phases and backfills; version payloads in conspectus_events.
Search — if indexing leaves the database, treat the search pipeline as a first-class operational dependency.

Failure modes (SQL vs blob)

There is no distributed transaction between the RDBMS and an external store; recovery relies on status fields and compensating actions.

Scenario	Detection	Mitigation
SQL committed; blob write failed	`sync_status` in `pending`/`failed`; missing hash/etag	Retry upload with the same `Idempotency-Key`; do not delete the SQL row from GC
Blob written; SQL rolled back	Orphan object key	Key prefixing, TTL GC, no user attachment until SQL commits
Drift between SQL and blob	`content_sha256` mismatch	Fail read or serve stale per policy; repair job re-fetch or re-upload

Architectural decisions

Each subsection compares credible alternatives. Rows with class="decision-chosen" (highlighted) record the option adopted for this codebase. Rejected options remain visible to avoid re-litigating settled trade-offs.

D1 — Review history vs general conspectus history

Option	Description	Pros	Cons
A Dual journals	`conspectus_review_logs` for reviews; `conspectus_events` for other facts	SRS-aligned; straightforward review queries	Two append-only streams
B Unified stream	Single `conspectus_events` including `REVIEW_APPLIED`	One table	Mixed access patterns and filtering cost
C Dual + mirror	A plus duplicate schedule transitions in events	Unified BI timeline	Duplication and consistency risk

Rationale (D1): A dedicated conspectus_review_logs matches how production SRS systems isolate high-volume retrieval traces. Non-review edits remain in conspectus_events without forcing reviews through a generic envelope.

D2 — Body storage

Option	Description	Pros	Cons
Inline JSON/TEXT	Columns on `conspectuses`	Single transaction; simple backup	Larger rows
Object store	Pointer and etag	Scales large blobs	Two-phase writes
Document DB	External document + FK in SQL	Flexible schema	Second system

Rationale (D2): Default inline storage fits expected limits; externalise when measurements prove it necessary.

D3 — Schedule shape

Option	Description	Pros	Cons
`conspectus_schedules` table	1:1 with conspectus	Clear separation of concerns	Join on read
Wide row	All columns on `conspectuses`	No join	Blurs content and schedule evolution

D4 — Database engine

Option	Description	Pros	Cons
SQLite	Development and small deployments	Minimal operations	Write throughput limits
PostgreSQL	Scale-out and concurrency path	Rich concurrency model	Higher operational burden

Rationale (D4): The schema stays portable; SQLite is the default; PostgreSQL when load or HA demand it.

D5 — Learning errors vs review outcomes

Option	Description	Pros	Cons
A Separate tables	`conspectus_review_logs` for tags; `learning_errors` for detail; optional `review_log_id`	Clear semantics and queries	Two inserts when both are captured
B Single stream	One table for all events including mistakes	Single append path	Mixed schemas; heavier filtering

Rationale (D5): SRS tags drive scheduling; learning errors capture remediation detail. Correlation is optional via learning_errors.review_log_id.

D6 — Manual schedule change vs review

Option	Description	Pros	Cons
A Events only	Manual reschedule appends `conspectus_events` (e.g. `SCHEDULE_ADJUSTED`), not `conspectus_review_logs`	No synthetic review rows	Schedule changes read from two sources
B Duplicate into review log	Every move also logged as review	Single “movement” table	Conflates retrieval with editorial edits

Rationale (D6): Keep conspectus_review_logs strictly for retrieval outcomes unless a future ADR mandates a BI mirror.

D7 — `schedule_policies` catalogue vs code-only rules

Option	Description	Pros	Cons
A Catalogue table + seed	`schedule_policies` holds `(schedule_policy_id, algorithm_version)` and `rules` JSON; services join for validation	Referential integrity; auditable defaults; reproducible transitions	Extra table and seed migrations
B Code-only	Policy IDs in rows but rules live only in application memory	Simple DDL	History depends on deploy version; harder to explain snapshots
C Unversioned JSON blob on schedule	Copy full rules onto each `conspectus_schedules` row	Self-contained rows	Large rows; policy drift across conspectuses

Rationale (D7): A catalogue row is the smallest structure that supports composite FKs from conspectus_schedules, keeps methodology-aligned defaults seedable, and matches how review logs store policy identifiers for audit.

Load and capacity

The figures below are illustrative—they support early sizing and storage discussions, not customer-facing SLAs. Replace them with product forecasts before publishing formal targets. Burst factors reflect bursty review sessions rather than uniform request rates.

Parameter	Example	Note
Active learners	10 000	MAU-style order of magnitude
Reviews / learner / day	5	SRS-shaped load
Mean review insert rate	~0.6/s	50k/day ÷ 86 400 s
Burst factor	10–50×	Session peaks

At roughly 0.5 KB per review log row, 50k reviews/day ≈ 25 MB/day append-only (~9 GB/year before indexes)—validate in staging.

Target p95 < 300 ms on light SQLite; move to PostgreSQL and stateless replicas when concurrency requires it.

Functional requirements

Behaviour the service must expose to integrators; concrete request shapes are defined in OpenAPI and the internal API hub.

ID	Requirement	Description
FR-1	User management	Create and resolve users by `system_user_id` and `system_uuid`; preserve stable `client_uuid` for ownership checks.
FR-2	Conspectus create	Accept ETR-shaped payloads (`cue_sheet`, `dense_paragraph`, `bullets`, optional `title`); persist snapshot and initial schedule per `SchedulePolicy`; append a `CREATED` domain event.
FR-3	Retrieve and review	List due conspectuses; apply deterministic transitions from tags (`easy`/`hard`/`forgot`); append `conspectus_review_logs` with schedule snapshots.
FR-4	Learning error log	Store and list weak-cue records; optional links to conspectus and (via `review_log_id`) to a review session—semantically distinct from review tags (see Domain model).
FR-5	Schedule insight	Expose aggregate slot distribution and schedule guidance (e.g. summary endpoint) for clients and dashboards.
FR-6	Due conspectuses (“what to review”)	List conspectuses due at or before a reference time, ordered per Review suggestion; respect optional user `timezone` for calendar-day filters when specified in OpenAPI.
FR-7	Schedule policy resolution	On create and review, resolve `(schedule_policy_id, algorithm_version)` against `schedule_policies`; reject unknown or retired pairs with a stable error (see ADR error contract).

Non-functional requirements

Quality attributes for operators and integrators; security and observability ADRs apply in full.

ID	Requirement	Description
NFR-1	Performance	Typical requests complete under 300 ms p95 with a local DB and light load; capacity figures remain illustrative (see Load and capacity).
NFR-2	Reliability	Writes are transactional; failures roll back and return stable errors per ADR 0003.
NFR-3	Maintainability	Strict layering (routers → services → repositories); automated contract and endpoint tests are mandatory.
NFR-4	API governance	OpenAPI and error contracts evolve in an additive, backward-compatible manner per versioning policy.
NFR-5	Observability	Structured logs; Prometheus on `/metrics` (config-gated); `/health` and `/ready`; optional local stack per ADR 0009 and SLO guidance in ADR 0011.
NFR-6	Security by default	API-key auth, per-route rate limits, body size limit, CORS allowlist, security headers per ADR 0005.
NFR-7	Idempotency	Critical writes support safe retries via `Idempotency-Key` and persisted deduplication (ADR 0006).
NFR-8	Packaging	Production `Dockerfile` and OCI workflow per ADR 0015 (`make docker-build`).

API style and transactional boundaries

Use resource-oriented REST under /api/v1 (conspectuses, schedule summary, learning errors, users) with command-style sub-resources where appropriate (e.g. …/actions/review). This aligns with OpenAPI governance, resource ownership, and HTTP caching semantics. Read models are current snapshot rows; history is exposed only where product needs justify dedicated endpoints.

Critical writes (single transaction)

Operation	Transaction touches	Idempotency
Create conspectus	Insert conspectus + schedule + `CREATED` event	Required `Idempotency-Key` (POST collection)
Review	Update schedule (bump `schedule_revision`) + insert `conspectus_review_logs` (no duplicate schedule event in `conspectus_events` by default)	Required; scoped to resource; body carries expected `schedule_revision`
Review + learning error	Same as review + insert `learning_errors` with `review_log_id` pointing at the new log row	Prefer one request and one transaction; alternatively two calls with explicit correlation
PATCH conspectus	Update body + bump `content_version` + content event	Required per resource
Create learning error	Insert `learning_errors` (optional `review_log_id`)	Required

API contracts and security defaults

Interactive schema: Swagger UI (local). Governance: ADR 0005, ADR 0003.

Constraint	Default	Impact
Authentication	`X-API-Key` on `/api/v1/*`	Unauthorized calls return `401`.
Rate limiting	60 requests / 60 s per client and path	Overflow returns `429`; clients must back off.
Request body size	1 MB (`API_BODY_MAX_BYTES`)	`413` above limit; large assets require a dedicated flow.
CORS	Allowlist origins	Browser clients from non-allowed origins cannot call the API directly.
Idempotency	Required for critical writes	Safe retries; reusing a key with a different body yields `409`; deduplication rows in the database.

Cross-cutting specification notes

Level: engineering rules that should surface in OpenAPI or ADRs before implementation freeze; they complement the domain model.

SchedulePolicy. schedule_policies is the source of truth for immutable rules. Treat (schedule_policy_id, algorithm_version) on schedules and logs as a foreign key to that table. Policy changes add new rows; historical conspectus_review_logs stay interpretable via stored snapshots and denormalized policy ids.
Concurrency on the same conspectus. Use content_version (If-Match / conditional PATCH) for content. For reviews, require an expected schedule_revision (or equivalent) on conspectus_schedules to reject stale double-submits from two devices—distinct from idempotent replay of the same request.
Cue sheet JSON evolution. Persist cue_sheet_schema_version (column or embedded metadata) so row migrations can transform legacy JSON; validate at the API boundary.
Multi-device / offline sync. Out of scope for a minimal API; if introduced later, define conflict policies separately for content and schedule (e.g. server-wins on schedule unless an ADR specifies otherwise).

Risks and open questions

Schedule policy details — encode delays in schedule_policies.rules; seed the reference policy for methodology alignment, then validate in QA with replay tests from conspectus_review_logs.
Forgot vs hard — identical reset behaviour or not; must be fixed in rules JSON before UX commitments.
Timezone — users.timezone should drive calendar-day due filters; until populated, document UTC-only behaviour in list endpoints to avoid silent off-by-one “today” bugs.
Concurrency — ordering between PATCH and review; optimistic locking via content_version is recommended for content.
Retention / GDPR — policies for append-only logs, export, and deletion.

ADR roadmap (toward a canonical document)

Promote the items below into numbered ADRs when ready to lock behaviour; until then, Architectural decisions on this page is the working record.

Dual journals (D1) — conspectus_review_logs vs conspectus_events; no duplicate review rows in events unless a BI ADR requires it.
Body storage default (D2) — inline JSON/TEXT; externalise only when measured.
Schedule table shape (D3) — 1:1 conspectus_schedules.
RDBMS engine (D4) — SQLite for development; PostgreSQL for scale.
Distributed runtime store — Redis for shared rate limits, idempotency cache, and short-lived keys.
Learning errors (D5) — separate from review tags; optional review_log_id.
Manual schedule edits (D6) — via events, not synthetic reviews.
Optional external content — pointer columns, sync_status, outbox/GC, failure modes (see Hybrid storage).
Schedule policy catalogue (D7) — schedule_policies seeding, retirement, and invalidation of unknown policy ids at create/review time.
Schedule policy versioning — how ladder changes affect interpretation of historical conspectus_review_logs rows (snapshots + policy ids on each log).
Content pedagogy mode — when to turn soft cue-sheet checks into hard API validation.

Human workflow text stays in Methodology; engineering truth is this page plus OpenAPI and migrations. Remove or reconcile conflicting draft text elsewhere.

Sequences

Implementations append logs in the same transaction as snapshot updates. Sources: docs/uml/sequences/.

Page history

Date	Change	Author
2026-04-21	Added Redis to system stack description and C4 container schema references.	Ivan Boyarkin
2026-04-21	Added Page history section (repository baseline).	Ivan Boyarkin