ETR Study API — System design

Overview

This document defines the system design of the ETR Study API: a single, versioned HTTP surface backed by a relational model that supports Extract–Transform–Retrieve study flows, spaced retrieval, and auditable history. It is written for engineers who implement, integrate, or operate the service.

It brings together:

  • scope and explicit boundaries (what we intentionally leave out);
  • a methodology traceability matrix and explicit non-goals (what the API does not model);
  • the review suggestion model—how due work is computed, ordered, and distinguished from client-only habits;
  • C4-style structural views (context, containers, components);
  • a relational persistence model with a first-class schedule_policies catalogue, aligned with spaced retrieval;
  • tabular architecture decisions with alternatives and the selected option;
  • functional and non-functional requirements, with pointers to OpenAPI and ADRs;
  • references to sequence diagrams and the internal API catalogue.

The human workflow in Methodology remains the pedagogical source for vocabulary and intent. This document turns that intent into engineering contracts: which rules are enforced by the API, which are defaults carried by SchedulePolicy, and which stay client-side. Authoritative HTTP shapes and operation-level detail live in the internal API hub once OpenAPI matches this design.

Scope and boundaries

The design is scoped to the HTTP API and the relational data model that underpins it. At the boundary we assume a single abstract actor, the integrator: any standards-compliant HTTP client. The deployable unit is modelled as one application process with one primary relational database. Horizontal scaling, connection pooling, and read replicas are deployment concerns once load and SLOs justify them.

Category In scope Out of scope
Product Users, conspectuses, versioned schedule policies, due retrieval state, review logs, learning errors, schedule summaries End-user UX; specific clients (browser, mobile, bots); offline sync; voice-memo / walk workflows unless explicitly added
Platform One API process, database, observability hooks Kubernetes topology, ingress, service mesh, node pools
Security API key, rate limits, CORS, idempotent writes Enterprise IdP (e.g. Okta, Azure AD, Google Workspace)

ETR methodology mapping

Level: Bridge to engineering. Methodology describes how humans study; this document assigns each concept to storage, policy, or client behaviour. Rows below name the primary artefact; see Methodology traceability for enforcement level.

Methodology System artefact Notes
Three-column cue sheet (Extract) conspectuses.cue_sheet + cue_sheet_schema_version Schema in Cue sheet JSON; validation tier in Content validation.
Transform (dense paragraph, 5–7 bullets) dense_paragraph, bullets Counts and sentence limits are defaults / advisory unless product turns them into hard API validation.
Retrieve — four slots A–D, tags easy / hard / forgot conspectus_schedules, schedule_policies, conspectus_review_logs Slot ladder and delays are data-driven via SchedulePolicy (see Reference policy).
What to review next (“due” work) next_review_at, due queries, ordering rules Mechanism in Review suggestion and scheduling logic.
Evening random review from Slot A Not a server obligation: client samples from listed conspectuses (see Non-goals).
“Succeed twice in a row” before advancing optional future: review_streak or session metadata Not in the baseline schema; SRS tags drive schedule transitions today.
Error log → next Transform pass learning_errors; content still CONTENT_PATCHED via events Remediation workflow (open loops, tasks) is product/UX, not implied by the error row alone.
Out-of-home walk / voice memo loop Out of scope unless a dedicated capture API is added; content still lands in Transform fields.

Methodology traceability matrix

Each methodology idea is classified so implementers do not guess: invariant (must hold in production data), default (shipped policy or OpenAPI default), advisory (documented, soft validation), client (learner or app behaviour only), out of scope.

Methodology element Class Where it lives
Separate cues from full notes (anchors, not transcripts) Advisory API may warn on payload size; cue sheet shape validates against cue_sheet_schema_version.
Three columns: keywords, questions/gaps, hints Invariant (shape) cue_sheet.rows[].keyword, .question, .hint for schema v1 (see Cue sheet JSON).
Pause every 5–10 minutes; ~50 min Extract block Client Timers and UX; not stored server-side.
Single master conspectus after Transform Invariant One schedule row per conspectus; content snapshot on conspectuses.
Paragraph ≤ 5 sentences; 5–7 bullets Advisory → optional invariant Document as limits; promote to strict validation via ADR if product requires.
Slot A today → B tomorrow → C +3d → D +7d then 14, 30… Default schedule_policies.rules for the reference policy; other products may ship different policies.
Tag easy / hard / forgot; hard/forgot reset toward A Invariant (behaviour) Deterministic transitions in the active SchedulePolicy for the row.
Evening random drill Client Client queries a pool (e.g. by slot) and samples; API lists, does not randomize for the user.
Error log drives next Transform Client / product Server stores errors; prioritising study sessions is UX.

Explicit non-goals (methodology vs API)

To avoid silent mismatch between Methodology and this service, the following are not required behaviours of the HTTP API in the baseline design:

  • Random “evening review” selection — the API exposes due times and filters; random sampling of older items is a client algorithm.
  • Consecutive success counts (“twice in a row”) — not represented in conspectus_review_logs; adding it is a schema and product change.
  • Walk / audio capture pipeline — voice memos and offline capture are out of scope until dedicated endpoints and storage exist.
  • Automatic remediation scheduling after learning_errors — errors are data; scheduling the next Transform session is UX.
  • Teaching quality scoring — no NLP or rubric evaluation of paragraph quality.

If product later needs any of the above, add ADRs and extend OpenAPI; do not assume they are implied by methodology text alone.

Review suggestion and scheduling logic

“What should I review now?” splits into server-backed due retrieval (grounded in next_review_at and policy) and client habits (random drills, time-boxing) that the API does not own.

Due set (server)

For a learner, the due conspectuses at instant t (UTC) are those owned by the user where conspectus_schedules.next_review_at <= t, subject to soft-delete or archive flags if introduced later. This is the primary input to any “suggestion” API: temporal ordering, not pedagogical ranking beyond policy.

Calendar “today” (e.g. “due today in my timezone”) is next_review_at interpreted in the user’s IANA timezone (see users.timezone), then compared to the local calendar date. Until timezone is set, due filtering should use UTC boundaries only—documented in API responses (see Risks).

Ordering (stable tie-breakers)

When multiple conspectuses are due, return order should be deterministic:

  1. next_review_at ascending (most overdue first);
  2. then conspectuses.created_at ascending (older content first);
  3. then conspectus_uuid lexicographic (total order for pagination).

Product may additionally boost “new” or “forgotten-heavy” items in the client; the server baseline stays explainable and replayable from timestamps alone.

Applying a review (state transition)

A review is one learner decision captured as tag ∈ { easy, hard, forgot } at time reviewed_at. The server:

  1. Loads the conspectus schedule and resolves the active policy row in schedule_policies (matching schedule_policy_id and algorithm_version on the schedule).
  2. Computes (slot', slot_d_ladder_index', next_review_at') from (slot, slot_d_ladder_index, tag) using immutable rules JSON for that policy version.
  3. Updates conspectus_schedules atomically and appends one row to conspectus_review_logs with schedule_before / schedule_after snapshots and increments schedule_revision (optimistic concurrency—see Persistence).

Idempotency replays the same HTTP request; optimistic concurrency rejects two different reviews racing on the same conspectus—distinct concerns (see Cross-cutting notes).

What the API does not compute

  • Random evening sample — clients list candidates (e.g. filter by slot = 'A' via summary API) and shuffle locally.
  • Interleaving subjects — cross-topic ordering is UX.
  • “Next best” across competing study goals — would require goals and priorities not in the baseline model.

Reference default SchedulePolicy (methodology-aligned)

Illustrative. Product may ship different delays; history remains interpretable because each review log stores policy identifiers and snapshots. The table below matches the four-slot cadence in broad strokes.

Field Example value Role
schedule_policy_id etr_methodology_four_slot Stable name for the policy family.
algorithm_version 1.0.0 Bump when transition rules change; never rewrite old log rows.
rules (JSON) Encodes: allowed slot values AD; for each (slot, tag) the next slot, optional slot_d_ladder_index for D-tier rungs, and delay_from_review_at (e.g. PT1H for first retrieval, P1D, P3D, then D ladder 7d → 14d → 30d → … for easy); hard and forgot map to reset toward Slot A per product decision (see Forgot vs hard).

New conspectuses receive this policy by default at creation time unless the integrator passes another schedule_policy_id that already exists in schedule_policies. Seeding the reference row is a deployment concern (migration or admin task).

Content validation policy (cue sheet, paragraph, bullets)

Methodology recommends bounds (sentence and bullet counts, lean cues). The API should validate in layers:

  • Hard — JSON schema for cue_sheet matches cue_sheet_schema_version; request rejected on parse failure.
  • Soft — optional warnings in logs or response extensions when bullets fall outside 5–7 or paragraph length exceeds guidance (feature-flagged).
  • None — free text allowed where product does not enable pedagogy mode.

Exact numeric limits belong in OpenAPI once product selects strict vs advisory mode; this document requires only that cue_sheet_schema_version exists before evolving cue_sheet shape.

Problem statement

Goal: a versioned HTTP API that durably supports ETR learning workflows.

Problem: retention collapses when rehearsal and state are not persisted with clear history.

Approach: a transactional service that separates content from schedule state, maintains append-only review and event logs, and exposes stable error semantics to integrators.

Domain model (aggregates and lifecycle)

The following aggregates map to relational tables (see the conspectus ER diagram):

  • User (users) — tenant boundary. Primary key client_uuid. External identifiers (system_user_id, system_uuid) resolve to this row, consistent with the User API. Optional timezone (IANA string, e.g. Europe/Berlin) supports calendar-day due views; if null, due endpoints document UTC-only semantics.
  • Conspectus (conspectuses, PK conspectus_uuid) — the canonical note after Transform: cue_sheet, dense_paragraph, bullets, optional title, cue_sheet_schema_version (integer, default 1), ownership via owner_client_uuid, monotonic content_version, and optional fields for large-body or hybrid storage.
  • SchedulePolicy (schedule_policies) — versioned catalogue of spaced-repetition rules. Composite natural key (schedule_policy_id, algorithm_version); carries immutable rules JSON and metadata. Referenced by conspectus_schedules so transitions stay auditable after policy updates.
  • ConspectusSchedule (conspectus_schedules, 1:1) — mutable retrieval state only:
    • slot — coarse position on the A–D ladder (aligns conceptually with Methodology: Retrieve);
    • slot_d_ladder_index — policy-specific sub-step (e.g. rung within the D tier), updated with slot according to the active (schedule_policy_id, algorithm_version) row;
    • next_review_at — drives due lists and any scheduling UX;
    • schedule_policy_id + algorithm_version — foreign key to schedule_policies; together they select the immutable rules used at review time.
    • schedule_revision — monotonic integer incremented on each successful schedule write; clients send expected revision on review to detect concurrent sessions (see Cross-cutting notes).
  • ConspectusReviewLog (conspectus_review_logs) — append-only: one row per review with outcome tag, reviewed_at, schedule_policy_id + algorithm_version (denormalized from the policy used), and immutable schedule_before / schedule_after JSON for audit. This is the system of record for review outcomes.
  • ConspectusEvent (conspectus_events) — append-only facts that are not review outcomes: creation, content patches, title changes, and (per D6) manual schedule adjustments such as SCHEDULE_ADJUSTED.
  • LearningError (learning_errors) — records weak cues or mistakes for remediation; semantically distinct from SRS tags. Optional review_log_id ties an error to a review session when both are captured together. Conflating this with review outcomes requires an explicit product decision and schema discriminant.

Lifecycle (summary): resolve default schedule_policies row → create conspectus, initial schedule (with policy FK + schedule_revision = 1), and a CREATED-class event → query by next_review_at per Review suggestion → review command inserts into conspectus_review_logs and updates conspectus_schedules → content PATCH appends to conspectus_events → learning errors stand alone or commit in the same transaction as a review (with review_log_id when applicable).

C4 decomposition

The service is a modular monolith: FastAPI, one database primary, Redis for distributed runtime concerns (rate limits, idempotency cache, short-lived keys), optional metrics and log pipelines. Regenerate diagrams with make docs-fix.

C4 · L1

System context

Integrator, API, database, OCI image, optional observability.

C4 L1
Source: docs/uml/architecture/system_context_view.puml
C4 · L2

Containers

API process, database, Redis runtime store, migrations, optional pipelines.

C4 L2
Source: docs/uml/architecture/container_view.puml
C4 · L3

Components

User, Conspectus, error log, SchedulePolicy.

C4 L3
Source: docs/uml/architecture/system_component_view.puml

Persistence and schema (relational baseline)

The schema separates note content (cue sheet, paragraph, bullets) from schedule state. SchedulePolicy is a first-class catalogue table so schedule_policy_id is never a dangling string. Each review (retrieval outcome) is append-only in conspectus_review_logs, with immutable schedule_before / schedule_after JSON plus schedule_policy_id and algorithm_version copied for audit (matching the policy row used for the transition). Non-review mutations append to conspectus_events (e.g. CREATED, CONTENT_PATCHED, TITLE_CHANGED).

Canonical rule: do not mirror every review into conspectus_events unless an ADR explicitly requires a BI-oriented duplicate—the review log is the source of truth for scheduling history.

Read models (conspectuses, conspectus_schedules) update in the same transaction as the corresponding log insert(s).

Naming: In prose the aggregate is conspectus (singular). Physical tables use plural snake_case—e.g. conspectuses, conspectus_schedules, conspectus_review_logs—as in the ER diagram. Sections may refer to conspectus_schedule logically; the table is conspectus_schedules.

Entity–relationship

ER

Core tables

Ownership, 1:1 schedule, append-only logs and events, optional learning-error link.

Entity-relationship diagram: users, conspectuses, schedule, review logs, events, learning_errors
Source: docs/uml/architecture/conspectus_er.puml

Tables, keys, and roles

Table Keys / FK Role
users PK client_uuid; unique (system_user_id, system_uuid) Identity and tenant boundary. Optional timezone for local “due today” semantics (see Review suggestion).
schedule_policies PK (schedule_policy_id, algorithm_version); optional unique policy_uuid for external references Immutable versioned rules: name, rules (JSON), created_at. Seeded with the reference policy; new versions add rows, never mutate history.
conspectuses PK conspectus_uuid; FK owner_client_uuidusers Content snapshot: title, cue_sheet (JSON), cue_sheet_schema_version (int), dense_paragraph, bullets (JSON), content_version, created_at, updated_at (see due ordering). For hybrid storage: body_storage, external_document_id, content_sha256, sync_status (see Hybrid storage).
conspectus_schedules PK/FK conspectus_uuidconspectuses ON DELETE CASCADE; FK (schedule_policy_id, algorithm_version)schedule_policies slot, slot_d_ladder_index, next_review_at, schedule_revision (bigint, ≥ 1), schedule_updated_at. Optional denormalized owner_client_uuid for index-only due queries (space vs join).
conspectus_review_logs PK id; FK conspectus_uuid Append-only: tag, reviewed_at, schedule_before / schedule_after, schedule_policy_id, algorithm_version (denormalized from the policy row used for the transition).
conspectus_events PK id; FK conspectus_uuid Append-only lifecycle: event_type, payload, schema_version, optional correlation_id.
learning_errors PK error_uuid; FK owner_client_uuid; optional conspectus_uuidconspectuses ON DELETE SET NULL; optional review_log_idconspectus_review_logs Pedagogical mistake log—distinct from SRS tags in conspectus_review_logs. Use review_log_id to correlate with a specific review when both are recorded.
idempotency_keys Unique (owner_client_uuid, endpoint_path, idempotency_key) (or equivalent scoped to the authenticated principal) Deduplication of critical writes; keys must not collide across users.

Indexes (typical queries)

  • Due workload: composite (owner_client_uuid, next_review_at) on conspectus_schedules if owner_client_uuid is denormalized; otherwise join conspectuses for ownership and index conspectus_schedules(conspectus_uuid, next_review_at).
  • Conspectus listing: (owner_client_uuid, updated_at DESC) on conspectuses.
  • Review history: (conspectus_uuid, reviewed_at DESC) on conspectus_review_logs.
  • Learning errors: (owner_client_uuid, created_at DESC); optional (conspectus_uuid, created_at DESC).
  • Idempotency: unique scope must include owner_client_uuid (or API principal), not only path + key.
  • Policy catalogue: (schedule_policy_id, algorithm_version) on schedule_policies is already the primary key.

Schema design notes (fixes vs earlier drafts)

  • schedule_policies was missing. Referencing schedule_policy_id without a parent row breaks referential integrity and makes migrations non-replayable; the catalogue table is required.
  • algorithm_version on the schedule. The schedule row must carry both schedule_policy_id and algorithm_version to reference exactly one immutable policy row (composite FK).
  • schedule_revision. Reviews need optimistic concurrency independent of idempotency keys; bump on every schedule mutation.
  • cue_sheet_schema_version. Prevents silent JSON drift; pair with migrations when cue_sheet shape changes.
  • idempotency_keys scope. Global uniqueness on (endpoint_path, idempotency_key) would let two users collide; scope by owner / principal.
  • learning_errors.conspectus_uuid. Prefer ON DELETE SET NULL so deleting a conspectus does not orphan errors that should remain for analytics, or choose CASCADE if errors must disappear with the note—product decision, documented in migrations.

Cue sheet, bullets, paragraph: SQL JSON vs external document

Default: store cue_sheet, bullets, and dense_paragraph inline (JSON / TEXT). This respects typical body-size limits, keeps backup and transactions straightforward, and allows JSON evolution through migrations and validation.

Cue sheet JSON (schema v1)

For cue_sheet_schema_version = 1, cue_sheet is an object with a rows array. Each row aligns with the three-column mental model in Methodology · ETR at home:

  • keyword — short anchor (one to three words).
  • question — question or gap (“What are the three steps of X?”).
  • hint — brief answer cue (optional on early rows; methodology allows one- to three-word hints).

Validation: reject unknown keys or missing rows when strict mode is on; future schema versions add columns rather than overloading strings. See Content validation policy.

Criterion Inline in RDBMS External blob or document store
Size / SLO Appropriate while under configured maximum body size. Prefer when payloads grow large or binary attachments appear.
Versioning content_version plus event payloads; migrate JSON with scripts. Object version / ETag in store; SQL holds pointer and hash.
Full-text search PostgreSQL FTS or generated columns; SQLite is more constrained. Often a dedicated search tier (operational cost).

Hybrid SQL + object or document store (optional)

When bodies are externalised, SQL must still anchor: conspectus_uuid, ownership, body_storage (inline vs external), external_document_id, content_version or etag, content_sha256, and optionally sync_status. Viable options span self-hosted S3-compatible storage, managed object tiers with free allowances, or document databases—selection is driven by operational cost, egress, and consistency semantics, not by a single vendor.

Risks and mitigations:

  • Split writes — use outbox or staged upload, compensating deletes, TTL-based GC for orphaned blobs.
  • Migrations — dual-write phases and backfills; version payloads in conspectus_events.
  • Search — if indexing leaves the database, treat the search pipeline as a first-class operational dependency.

Failure modes (SQL vs blob)

There is no distributed transaction between the RDBMS and an external store; recovery relies on status fields and compensating actions.

Scenario Detection Mitigation
SQL committed; blob write failed sync_status in pending/failed; missing hash/etag Retry upload with the same Idempotency-Key; do not delete the SQL row from GC
Blob written; SQL rolled back Orphan object key Key prefixing, TTL GC, no user attachment until SQL commits
Drift between SQL and blob content_sha256 mismatch Fail read or serve stale per policy; repair job re-fetch or re-upload

Architectural decisions

Each subsection compares credible alternatives. Rows with class="decision-chosen" (highlighted) record the option adopted for this codebase. Rejected options remain visible to avoid re-litigating settled trade-offs.

D1 — Review history vs general conspectus history

Option Description Pros Cons
A Dual journals conspectus_review_logs for reviews; conspectus_events for other facts SRS-aligned; straightforward review queries Two append-only streams
B Unified stream Single conspectus_events including REVIEW_APPLIED One table Mixed access patterns and filtering cost
C Dual + mirror A plus duplicate schedule transitions in events Unified BI timeline Duplication and consistency risk

Rationale (D1): A dedicated conspectus_review_logs matches how production SRS systems isolate high-volume retrieval traces. Non-review edits remain in conspectus_events without forcing reviews through a generic envelope.

D2 — Body storage

Option Description Pros Cons
Inline JSON/TEXT Columns on conspectuses Single transaction; simple backup Larger rows
Object store Pointer and etag Scales large blobs Two-phase writes
Document DB External document + FK in SQL Flexible schema Second system

Rationale (D2): Default inline storage fits expected limits; externalise when measurements prove it necessary.

D3 — Schedule shape

Option Description Pros Cons
conspectus_schedules table 1:1 with conspectus Clear separation of concerns Join on read
Wide row All columns on conspectuses No join Blurs content and schedule evolution

D4 — Database engine

Option Description Pros Cons
SQLite Development and small deployments Minimal operations Write throughput limits
PostgreSQL Scale-out and concurrency path Rich concurrency model Higher operational burden

Rationale (D4): The schema stays portable; SQLite is the default; PostgreSQL when load or HA demand it.

D5 — Learning errors vs review outcomes

Option Description Pros Cons
A Separate tables conspectus_review_logs for tags; learning_errors for detail; optional review_log_id Clear semantics and queries Two inserts when both are captured
B Single stream One table for all events including mistakes Single append path Mixed schemas; heavier filtering

Rationale (D5): SRS tags drive scheduling; learning errors capture remediation detail. Correlation is optional via learning_errors.review_log_id.

D6 — Manual schedule change vs review

Option Description Pros Cons
A Events only Manual reschedule appends conspectus_events (e.g. SCHEDULE_ADJUSTED), not conspectus_review_logs No synthetic review rows Schedule changes read from two sources
B Duplicate into review log Every move also logged as review Single “movement” table Conflates retrieval with editorial edits

Rationale (D6): Keep conspectus_review_logs strictly for retrieval outcomes unless a future ADR mandates a BI mirror.

D7 — schedule_policies catalogue vs code-only rules

Option Description Pros Cons
A Catalogue table + seed schedule_policies holds (schedule_policy_id, algorithm_version) and rules JSON; services join for validation Referential integrity; auditable defaults; reproducible transitions Extra table and seed migrations
B Code-only Policy IDs in rows but rules live only in application memory Simple DDL History depends on deploy version; harder to explain snapshots
C Unversioned JSON blob on schedule Copy full rules onto each conspectus_schedules row Self-contained rows Large rows; policy drift across conspectuses

Rationale (D7): A catalogue row is the smallest structure that supports composite FKs from conspectus_schedules, keeps methodology-aligned defaults seedable, and matches how review logs store policy identifiers for audit.

Load and capacity

The figures below are illustrative—they support early sizing and storage discussions, not customer-facing SLAs. Replace them with product forecasts before publishing formal targets. Burst factors reflect bursty review sessions rather than uniform request rates.

Parameter Example Note
Active learners 10 000 MAU-style order of magnitude
Reviews / learner / day 5 SRS-shaped load
Mean review insert rate ~0.6/s 50k/day ÷ 86 400 s
Burst factor 10–50× Session peaks

At roughly 0.5 KB per review log row, 50k reviews/day ≈ 25 MB/day append-only (~9 GB/year before indexes)—validate in staging.

Target p95 < 300 ms on light SQLite; move to PostgreSQL and stateless replicas when concurrency requires it.

Functional requirements

Behaviour the service must expose to integrators; concrete request shapes are defined in OpenAPI and the internal API hub.

ID Requirement Description
FR-1 User management Create and resolve users by system_user_id and system_uuid; preserve stable client_uuid for ownership checks.
FR-2 Conspectus create Accept ETR-shaped payloads (cue_sheet, dense_paragraph, bullets, optional title); persist snapshot and initial schedule per SchedulePolicy; append a CREATED domain event.
FR-3 Retrieve and review List due conspectuses; apply deterministic transitions from tags (easy/hard/forgot); append conspectus_review_logs with schedule snapshots.
FR-4 Learning error log Store and list weak-cue records; optional links to conspectus and (via review_log_id) to a review session—semantically distinct from review tags (see Domain model).
FR-5 Schedule insight Expose aggregate slot distribution and schedule guidance (e.g. summary endpoint) for clients and dashboards.
FR-6 Due conspectuses (“what to review”) List conspectuses due at or before a reference time, ordered per Review suggestion; respect optional user timezone for calendar-day filters when specified in OpenAPI.
FR-7 Schedule policy resolution On create and review, resolve (schedule_policy_id, algorithm_version) against schedule_policies; reject unknown or retired pairs with a stable error (see ADR error contract).

Non-functional requirements

Quality attributes for operators and integrators; security and observability ADRs apply in full.

ID Requirement Description
NFR-1 Performance Typical requests complete under 300 ms p95 with a local DB and light load; capacity figures remain illustrative (see Load and capacity).
NFR-2 Reliability Writes are transactional; failures roll back and return stable errors per ADR 0003.
NFR-3 Maintainability Strict layering (routers → services → repositories); automated contract and endpoint tests are mandatory.
NFR-4 API governance OpenAPI and error contracts evolve in an additive, backward-compatible manner per versioning policy.
NFR-5 Observability Structured logs; Prometheus on /metrics (config-gated); /health and /ready; optional local stack per ADR 0009 and SLO guidance in ADR 0011.
NFR-6 Security by default API-key auth, per-route rate limits, body size limit, CORS allowlist, security headers per ADR 0005.
NFR-7 Idempotency Critical writes support safe retries via Idempotency-Key and persisted deduplication (ADR 0006).
NFR-8 Packaging Production Dockerfile and OCI workflow per ADR 0015 (make docker-build).

API style and transactional boundaries

Use resource-oriented REST under /api/v1 (conspectuses, schedule summary, learning errors, users) with command-style sub-resources where appropriate (e.g. …/actions/review). This aligns with OpenAPI governance, resource ownership, and HTTP caching semantics. Read models are current snapshot rows; history is exposed only where product needs justify dedicated endpoints.

Critical writes (single transaction)

Operation Transaction touches Idempotency
Create conspectus Insert conspectus + schedule + CREATED event Required Idempotency-Key (POST collection)
Review Update schedule (bump schedule_revision) + insert conspectus_review_logs (no duplicate schedule event in conspectus_events by default) Required; scoped to resource; body carries expected schedule_revision
Review + learning error Same as review + insert learning_errors with review_log_id pointing at the new log row Prefer one request and one transaction; alternatively two calls with explicit correlation
PATCH conspectus Update body + bump content_version + content event Required per resource
Create learning error Insert learning_errors (optional review_log_id) Required

API contracts and security defaults

Interactive schema: Swagger UI (local). Governance: ADR 0005, ADR 0003.

Constraint Default Impact
Authentication X-API-Key on /api/v1/* Unauthorized calls return 401.
Rate limiting 60 requests / 60 s per client and path Overflow returns 429; clients must back off.
Request body size 1 MB (API_BODY_MAX_BYTES) 413 above limit; large assets require a dedicated flow.
CORS Allowlist origins Browser clients from non-allowed origins cannot call the API directly.
Idempotency Required for critical writes Safe retries; reusing a key with a different body yields 409; deduplication rows in the database.

Cross-cutting specification notes

Level: engineering rules that should surface in OpenAPI or ADRs before implementation freeze; they complement the domain model.

  • SchedulePolicy. schedule_policies is the source of truth for immutable rules. Treat (schedule_policy_id, algorithm_version) on schedules and logs as a foreign key to that table. Policy changes add new rows; historical conspectus_review_logs stay interpretable via stored snapshots and denormalized policy ids.
  • Concurrency on the same conspectus. Use content_version (If-Match / conditional PATCH) for content. For reviews, require an expected schedule_revision (or equivalent) on conspectus_schedules to reject stale double-submits from two devices—distinct from idempotent replay of the same request.
  • Cue sheet JSON evolution. Persist cue_sheet_schema_version (column or embedded metadata) so row migrations can transform legacy JSON; validate at the API boundary.
  • Multi-device / offline sync. Out of scope for a minimal API; if introduced later, define conflict policies separately for content and schedule (e.g. server-wins on schedule unless an ADR specifies otherwise).

Risks and open questions

  • Schedule policy details — encode delays in schedule_policies.rules; seed the reference policy for methodology alignment, then validate in QA with replay tests from conspectus_review_logs.
  • Forgot vs hard — identical reset behaviour or not; must be fixed in rules JSON before UX commitments.
  • Timezoneusers.timezone should drive calendar-day due filters; until populated, document UTC-only behaviour in list endpoints to avoid silent off-by-one “today” bugs.
  • Concurrency — ordering between PATCH and review; optimistic locking via content_version is recommended for content.
  • Retention / GDPR — policies for append-only logs, export, and deletion.

ADR roadmap (toward a canonical document)

Promote the items below into numbered ADRs when ready to lock behaviour; until then, Architectural decisions on this page is the working record.

  • Dual journals (D1)conspectus_review_logs vs conspectus_events; no duplicate review rows in events unless a BI ADR requires it.
  • Body storage default (D2) — inline JSON/TEXT; externalise only when measured.
  • Schedule table shape (D3) — 1:1 conspectus_schedules.
  • RDBMS engine (D4) — SQLite for development; PostgreSQL for scale.
  • Distributed runtime store — Redis for shared rate limits, idempotency cache, and short-lived keys.
  • Learning errors (D5) — separate from review tags; optional review_log_id.
  • Manual schedule edits (D6) — via events, not synthetic reviews.
  • Optional external content — pointer columns, sync_status, outbox/GC, failure modes (see Hybrid storage).
  • Schedule policy catalogue (D7)schedule_policies seeding, retirement, and invalidation of unknown policy ids at create/review time.
  • Schedule policy versioning — how ladder changes affect interpretation of historical conspectus_review_logs rows (snapshots + policy ids on each log).
  • Content pedagogy mode — when to turn soft cue-sheet checks into hard API validation.

Human workflow text stays in Methodology; engineering truth is this page plus OpenAPI and migrations. Remove or reconcile conflicting draft text elsewhere.

Sequences

Implementations append logs in the same transaction as snapshot updates. Sources: docs/uml/sequences/.

Sequence

Create conspectus

create
create_conspectus_sequence.puml
Sequence

Review

review
review_retrieve_sequence.puml
Sequence

Error log

errors
error_log_sequence.puml

Page history

Date Change Author
Added Redis to system stack description and C4 container schema references. Ivan Boyarkin
Added Page history section (repository baseline). Ivan Boyarkin