Backlog — quality and scale roadmap

Overview

One ordered list of engineering work: most urgent first, nice-to-have last. Dates are rough calendar-day ranges (weekends off), for 3 / 2 / 1 hours of focused work per day. They are not deadlines—only a shared guess for planning. To change status, edit the pill class (see the legend). This page is not ADR lifecycle: ADRs use data-adr-weight and ratification (ADR 0018).

Status legend

Work item states

How to change status (README.html)

Colors live in docs/backlog/backlog.css (CSS variables).
To change colors globally, edit --status-* there.
In each item’s <h2>, find <span class="status-pill status-pill--..."> and set the part after status-pill-- to todo, in-progress, done, blocked, or rejected.

Meta Backlog page and `docs/backlog/` layout Done

1) Summary: Keep a simple backlog in docs/backlog/: HTML, shared pill styles, and the same fields per item so anyone reading the repo sees priorities without a separate tool.
2) Problem & value: Without one place to look, plans scatter across chat and tickets. This page is versioned with the code—good for reviews, onboarding, and release planning if you do not use Jira/Linear or want docs next to the source.
3) Delivered (to date): README.html, backlog.css (status color variables), navigation from the developer docs index, and the status model described in the legend above (see ADR 0001 for the README.html convention).
4) Rough estimate (calendar days): 3 h/day 1 calendar day

2 h/day 1 calendar day

1 h/day 1 calendar day

P0 — critical CI pipeline: quality gates, coverage threshold, docs-check Done

1) Summary: Run the same checks as locally (make verify or similar) in CI: format, types, tests with a coverage floor, OpenAPI/docs sync, contract tests—so main stays shippable.
2) Problem & value: Problem: If CI is weak or missing, bad code and doc drift can merge to main.
Value: Bugs surface earlier; reviewers see a green pipeline; standard gates on each PR.
3) Scope & deliverables: GitHub Actions (or equivalent): run formatter/linter, mypy, pytest with coverage and fail-under, OpenAPI check, contract tests, docs-check; cache dependencies for speed; fail the build on violations. Optionally run the same hooks in CI that pre-commit uses locally. Document how to reproduce CI locally in contributor docs.
4) Rough estimate (calendar days): 3 h/day ~4 days (~12 h)

2 h/day ~6 days

1 h/day ~12 days
5) Delivered (to date): .github/workflows/ci.yml on push/PR to main or master: Python 3.11, pip cache, make verify then pre-commit run --all-files. Coverage enforced via pytest-cov and fail_under in pyproject.toml. Local pre-push drift check: make verify-ci. Contributor note in engineering practices (CI / reproduce locally).

P0 SLOs / SLAs, error budget, and monitoring alerts Done

1) Summary: Set SLIs/SLOs (and optional SLAs) for latency, uptime, and errors. Add dashboards and alerts in Prometheus (or your stack). Write what to do when the error budget runs out (freeze, incident flow).
2) Problem & value: Problem: Metrics without targets do not guide releases or on-call.
Value: Clear reliability goals; ops and product share the same numbers (error-budget style).
3) Scope & deliverables: ADR or runbook: e.g. p95 latency, /ready-based availability, max 5xx ratio; Grafana panels and Prometheus alert rules; runbook steps for budget exhaustion; link from existing observability docs.
4) Rough estimate (calendar days): 3 h/day ~4 days (~12 h)

2 h/day ~6 days

1 h/day ~12 days
5) Delivered (to date): ADR 0011; Prometheus recording and alert rules in ops/prometheus/rules/study_app_slo.yml; Grafana dashboard import and Blackbox probe wiring via docker-compose.observability.yml and ops/prometheus/; links from README and developer docs to local observability URLs.

P1 DB uniqueness, race safety, and documented concurrency behavior Done

1) Summary: Put business rules in the DB (unique constraints), map conflicts to the API error shape, and document races (double submit, retries, idempotency) so behavior under load is clear.
2) Problem & value: Problem: “Check then insert” without unique indexes can race and duplicate rows.
Value: The database enforces truth; clients get stable errors; less guesswork for support.
3) Scope & deliverables: Add constraints on natural keys; consistent HTTP/status codes for conflicts; developer doc table: scenarios (duplicate create, idempotent retry) and expected outcomes.
4) Rough estimate (calendar days): 3 h/day ~3 days (~10 h)

2 h/day ~5 days

1 h/day ~10 days
5) Delivered (to date): Alembic/SQLAlchemy unique indexes and constraints on natural keys (e.g. systems, timezones, users, idempotency_keys composite uniqueness per ADR 0006); HTTP/error contract for conflicts and idempotency replay; ADR 0006 plus developer docs and error matrix for retry and idempotency semantics.

P1 Dependency security (`pip-audit`) and update policy Done

1) Summary: Scan Python deps for known CVEs, pin direct deps for reproducible builds, and write a short upgrade policy (how often, by severity).
2) Problem & value: Problem: Loose or old pins add CVE risk and surprise builds.
Value: Safer supply chain; fewer panic upgrades.
3) Scope & deliverables: make deps-audit (or equivalent) using pip-audit; integrate into CI; ADR: pinning rules, review cadence, exception process.
4) Rough estimate (calendar days): 3 h/day ~1 day (~3 h)

2 h/day ~2 days

1 h/day ~3 days
5) Delivered (to date): Policy ADR: ADR 0019 (pinning, pip-audit, cadence, exceptions). pip_audit pinned in requirements.txt; make deps-audit (OSV scan, .pip-audit-cache); CI quality job runs it before make verify (.github/workflows/ci.yml); make verify-ci includes deps-audit for local pre-push parity; engineering practices table documents the command.

P1 Integration tests against the database (CI) To do

1) Summary: Add integration tests that hit real PostgreSQL in CI (Compose), with migrations and repositories—alongside fast unit tests on SQLite or mocks.
2) Problem & Value: Problem: SQLite in dev and Postgres in prod differ (types, locks, edge cases).
Value: Fast unit tests plus a smaller set of real-DB tests catches data-heavy bugs.
3) Scope & Deliverables: Docker Compose service for PostgreSQL in CI; integration Pytest marker; database fixtures and teardown; migrations executed before tests; well-documented local command matching the CI process.
4) Rough estimate (calendar days): 3 h/day ~5 days (~16 hours)

2 h/day ~8 days

1 h/day ~16 days

P2 Test pyramid: unit tests + property-based tests (Hypothesis) To do

1) Summary: Pull pure validation into small functions; test them with unit tests and property tests (Hypothesis) so many inputs are checked without slow E2E only.
2) Problem & value: Problem: Hand-picked examples miss edge mixes; line coverage is not proof.
Value: Cheaper than E2E for heavy validation; common pattern for parsers and rules.
3) Scope & deliverables: Identify critical pure functions; Hypothesis strategies; guidelines in the developer guide (when to use properties vs examples).
4) Rough estimate (calendar days): 3 h/day ~4 days (~12 h)

2 h/day ~6 days

1 h/day ~12 days

P2 E2E: HTTP scenarios against a running application To do

1) Summary: Add a few E2E tests: real HTTP → API → DB on a running stack, with real status codes, headers, and bodies—not only mocks.
2) Problem & value: Problem: Lower-level tests can miss wiring, middleware, and contract gaps.
Value: Extra confidence before release; does not replace unit/integration tests.
3) Scope & deliverables: httpx + pytest (or similar); separate CI job or marker e2e; minimal happy path plus a few representative error paths; stable test data strategy.
4) Rough estimate (calendar days): 3 h/day ~3 days (~8 h)

2 h/day ~4 days

1 h/day ~8 days

P2 Mutation testing pilot for critical rules To do

1) Summary: Try mutation testing (e.g. cosmic-ray, mutmut) on one or two sensitive modules to see if tests really catch broken code—not just high line coverage.
2) Problem & value: Problem: Coverage can look good with weak asserts.
Value: Shows where tests actually fail bad mutations; points to stronger asserts.
3) Scope & deliverables: Scoped run in CI or manual report; threshold or qualitative review; ADR if adopted team-wide; avoid full-repo runs initially (cost).
4) Rough estimate (calendar days): 3 h/day ~2 days (~6 h)

2 h/day ~3 days

1 h/day ~6 days

P3 Distributed rate limiting with Redis (multi-instance) To do

1) Summary: Move rate limits from per-process memory to Redis so limits match across many API instances, still returning HTTP 429 as today.
2) Problem & value: Problem: Each replica has its own counter, so abuse scales with instance count.
Value: Normal pattern behind a load balancer; fair limits when you scale out.
3) Scope & deliverables: Replace InMemoryRateLimiter in app/core/security.py with a Redis-backed adapter using shared counters + TTL windows, preserving existing 429 behavior and headers (X-RateLimit-*, Retry-After); ADR (algorithm: token bucket or sliding window); Compose profile for local dev/CI; clear fallback policy when Redis is unavailable (fail-open vs fail-closed) and integration tests for multi-worker consistency.
4) Rough estimate (calendar days): 3 h/day ~5 days (~14 h)

2 h/day ~7 days

1 h/day ~14 days

P2 Redis adoption: idempotency cache and short-lived platform data To do

1) Summary: Add Redis as a shared fast-access layer for idempotency lookups and short-lived data, while keeping SQL as the source of truth and audit trail.
2) Problem & value: Problem: Idempotency currently checks SQL on each request and temporary runtime data has no shared distributed store.
Value: Lower latency/load for retries and repeated reads; enables safe scale-out patterns for cache, one-time keys, and future background workers.
3) Scope & deliverables: Redis read-through cache for app/repositories/idempotency_repository.py (idempotency_key -> status/response/payload_hash with TTL); SQL remains canonical persistence; key schema + TTL policy doc; cache invalidation and observability metrics (hit ratio, errors, fallback rate); optional adapters for hot GET cache (user/course cards), short-lived tokens/OTP/one-time keys, and starter queue primitives for later worker adoption.
4) Rough estimate (calendar days): 3 h/day ~6 days (~18 h)

2 h/day ~9 days

1 h/day ~18 days

P3 PostgreSQL as primary database and migration path from SQLite To do

1) Summary: Use PostgreSQL for staging/prod (Docker-friendly), keep schema in Alembic, and plan a safe move from SQLite where it still runs—including data migration runbooks if needed.
2) Problem & value: Problem: SQLite is fine for dev but weak for concurrent prod ops, backups, and tooling.
Value: Postgres matches usual ops expectations (backup, HA, metrics).
3) Scope & deliverables: docker-compose for app + Postgres; environment profiles; SQLAlchemy compatibility checks; migration runbook; cutover or dual-write strategy as appropriate.
4) Rough estimate (calendar days): 3 h/day ~8 days (~24 h)

2 h/day ~12 days

1 h/day ~24 days

P3 Load testing (k6 / Locust) and SLO validation Done

1) Summary: Add repeatable load scripts (e.g. k6 or Locust) for p95 and errors under set concurrency; compare to SLOs; optional CI or nightly job.
2) Problem & value: Problem: Many teams only load-test in production.
Value: Plan capacity from data; spot regressions before traffic jumps.
3) Scope & deliverables: Scripts under ops/load/ or tests/load/; baseline profile (e.g. user CRUD); report artifact; optional nightly job with thresholds; link to SLO doc.
4) Rough estimate (calendar days): 3 h/day ~3 days (~8 h)

2 h/day ~4 days

1 h/day ~8 days
5) Delivered (to date): Python runner and scenarios under tools/load_testing/; Makefile targets run-loadtest-api and run-loadtest-api-serve for local runs against a live API; scenario docs in-repo. Optional automated CI smoke/load remains tracked under item 23.

P4 Test data: seeds, fixtures, runbook To do

1) Summary: Ship reference data and fixtures for integration/E2E tests, plus a short runbook to reset or seed state without hidden tricks in conftest.py.
2) Problem & value: Problem: Opaque setup flakes and hides real bugs.
Value: Stable CI; new contributors get green tests faster.
3) Scope & deliverables: Single seed script or Alembic data revision; naming conventions; developer doc for when to extend seeds vs factories.
4) Rough estimate (calendar days): 3 h/day ~2 days (~5 h)

2 h/day ~3 days

1 h/day ~5 days

P4 Feature flags (safe rollout) To do

1) Summary: Add simple feature flags (env and/or Redis) so risky behavior can flip without a full deploy—gradual rollout and fast rollback.
2) Problem & value: Problem: Every fix needs a deploy, which slows incidents.
Value: Turn behavior on/off quickly; canary-style rollouts.
3) Scope & deliverables: Small provider abstraction in config; ADR on evaluation rules and caching; one non-business demo flag to prove wiring; security note on who can change flags.
4) Rough estimate (calendar days): 3 h/day ~3 days (~10 h)

2 h/day ~5 days

1 h/day ~10 days

P4 Architecture fitness: enforced layer boundaries To do

1) Summary: Enforce layer rules in CI (e.g. routers do not import repositories directly)—import-linter or a graph check in make verify.
2) Problem & value: Problem: Loose imports cause cycles and muddy layers as the team grows.
Value: The code matches the documented architecture.
3) Scope & deliverables: Rule set matching the intended layers; documented exceptions; CI failure on violation; pointer in contributor guide.
4) Rough estimate (calendar days): 3 h/day ~2 days (~5 h)

2 h/day ~3 days

1 h/day ~5 days

P4 Changelog and release-time verification Done

1) Summary: Use Keep a Changelog for user-visible and breaking changes; optional CI check on release tags.
2) Problem & value: Problem: Users and support lack one dated list of what shipped.
Value: Clear history for API consumers and audits.
3) Scope & deliverables: CHANGELOG.md structure; CONTRIBUTING rules; optional CI step on tag/release that fails if changelog section is missing for the version.
4) Rough estimate (calendar days): 3 h/day ~1 day (~3 h)

2 h/day ~2 days

1 h/day ~3 days
5) Delivered (to date): CHANGELOG.md (Keep a Changelog); ADR 0013; scripts/changelog_gate.py and optional scripts/changelog_draft.py; CI workflow runs the gate on PR/push to main / master (see .github/workflows/ci.yml).

P4 — lower priority Repository cleanup and dead code removal Done

1) Summary: Remove unused modules, duplicate logic, and old hacks after contracts stabilize—static analysis plus human review.
2) Problem & value: Problem: Dead code confuses readers and hides real flow.
Value: Smaller surface, faster reviews—best after big refactors land.
3) Scope & deliverables: vulture / Ruff unused-import rules; manual triage; delete only with test confidence; optional periodic job or checklist in maintainers’ guide.
4) Rough estimate (calendar days): 3 h/day ~2 days (~4 h)

2 h/day ~2 days

1 h/day ~4 days
5) Delivered (to date): ADR 0014; Ruff F401 + RUF100 and per-file-ignores for tests/conftest.py (E402); [tool.vulture] in pyproject.toml; make dead-code-check; weekly .github/workflows/dead-code.yml; checklist in CONTRIBUTING.md. Ongoing removal of dead code remains manual with test-backed review.

P2 Distributed tracing (OpenTelemetry) and trace context To do

1) Summary: Add OpenTelemetry (W3C): HTTP spans, outbound calls, DB spans where useful; export OTLP to Jaeger/Tempo or a host. Tie traces to logs/metrics with trace_id / span_id.
2) Problem & value: Problem: Metrics and logs alone do not show one request across services and replicas.
Value: Standard way to debug latency and dependencies; adds to SLO work (ADR 0011), not a replacement.
3) Scope & deliverables: ADR (sampler rules, PII in attributes, env vars); FastAPI/Starlette OTel integration; optional Compose service for local trace UI; developer doc: how to find a trace from a failing request.
4) Rough estimate (calendar days): 3 h/day ~5 days (~16 h)

2 h/day ~8 days

1 h/day ~16 days

P3 Structured logging (JSON) and request correlation IDs Done

1) Summary: Ship JSON logs in prod (optional in dev) with stable fields: time, level, logger, message, and correlation (request_id; trace_id when OTel ships). Accept or set X-Request-Id per request.
2) Problem & value: Problem: Plain text is hard to search at scale and to join across services.
Value: Fits ELK/Datadog-style tools; faster support and postmortems.
3) Scope & deliverables: Config flag or APP_ENV switch; middleware for request ID; document field list; ensure PII policy (no secrets in log fields).
4) Rough estimate (calendar days): 3 h/day ~3 days (~8 h)

2 h/day ~4 days

1 h/day ~8 days
5) Delivered (to date): LOG_FORMAT / LOG_SERVICE_NAME; NDJSON fields including request_id, trace_id, span_id (placeholders for OTel); X-Request-Id middleware; optional docker-compose.logging.yml (Elasticsearch, Kibana, Filebeat) and ADR 0023.

P3 CI: container image SBOM and vulnerability scan (published GHCR image) To do

1) Summary: After push to GHCR: build an SBOM (e.g. Syft), scan the image for CVEs (Trivy/Grype), upload SARIF or a summary, warn or fail on severity—document the rules for engineers.
2) Problem & value: Problem: pip-audit does not cover OS packages inside the image.
Value: Meets common supply-chain checks; complements ADR 0019.
3) Scope & deliverables: GitHub Actions job after publish-image (or same workflow step with the built digest); artifact retention policy; exception process for accepted CVEs.
4) Rough estimate (calendar days): 3 h/day ~2 days (~6 h)

2 h/day ~3 days

1 h/day ~6 days

P1 OpenAPI governance: lint, semantic baseline, and strict contract test Done

1) Summary: Treat OpenAPI like code: lint operation IDs, summaries, examples on writes and 422s, and block surprise breaks vs a checked-in baseline.
2) Problem & value: Problem: Drift or missing docs confuse clients and hide breaks.
Value: Visible contract and safer changes inside /api/v1 (ADR 0007).
3) Scope & deliverables: Lint rules; semantic compare vs docs/openapi/openapi-baseline.json; optional strict byte-for-byte contract-test; documented accept workflow when changing the spec intentionally.
4) Rough estimate (calendar days): 3 h/day ~3 days (~8 h)

2 h/day ~4 days

1 h/day ~8 days
5) Delivered (to date): scripts/openapi_governance.py; make openapi-check and make contract-test in Makefile; baseline under docs/openapi/openapi-baseline.json; integrated into make verify / CI (.github/workflows/ci.yml); ADR 0007.

P1 Continuous delivery: publish container image to GHCR from `main` / tags Done

1) Summary: After CI passes, build and push a Docker image to GHCR so staging and docs can pull a known digest.
2) Problem & value: Problem: Without a standard image, container workflows are harder to share.
Value: Commit → tested image → registry is the usual path.
3) Scope & deliverables: Workflow job with Buildx cache; tags for SHA, semver, and latest on default branch; ADR describing scope (registry delivery vs full prod rollout).
4) Rough estimate (calendar days): 3 h/day ~2 days (~6 h)

2 h/day ~3 days

1 h/day ~6 days
5) Delivered (to date): publish-image job in .github/workflows/ci.yml (GHCR, metadata tags, GHA cache); ADR 0021; make docker-build for local parity.

P2 Single source of truth for application / OpenAPI version metadata To do

1) Summary: One version string (e.g. pyproject.toml or app/__version__.py) wired into FastAPI, OpenAPI, and release notes so tags and docs match the running app.
2) Problem & value: Problem: Hardcoded FastAPI(version=…) drifts from changelog and tags.
Value: Support and automation know which build is running.
3) Scope & deliverables: Single import or TOML field; optional /live or /version payload field; contributor note: bump process tied to release.
4) Rough estimate (calendar days): 3 h/day ~1 day (~3 h)

2 h/day ~2 days

1 h/day ~3 days

P2 CI: optional HTTP smoke or load-regression job (post-`main` or nightly) To do

1) Summary: In GitHub Actions, run a small smoke or short load (health + one protected route, or tools/load_testing/) so latency and 5xx regressions show without manual make run-loadtest-api. Builds on item 11.
2) Problem & value: Problem: Item 11’s tooling does not run on every merge by default.
Value: Catches perf/timeouts earlier; works with SLO alerts.
3) Scope & deliverables: Workflow that starts API + DB (or uses test stack), runs scripted checks with thresholds; document flakiness mitigations; keep optional or workflow_dispatch if cost is a concern.
4) Rough estimate (calendar days): 3 h/day ~2 days (~6 h)

2 h/day ~3 days

1 h/day ~6 days

P2 Document versioning: Document Version field and change policy To do

1) Summary: Add explicit document versioning with a Document Version field so each docs page has a clear version value tied to updates and release cadence.
2) Problem & value: Problem: Without an explicit version marker, readers cannot quickly confirm whether they are looking at the latest or expected revision.
Value: Better traceability for reviews and audits; easier support communication when discussing specific document revisions.
3) Scope & deliverables: Define where Document Version is stored/rendered in docs templates; set increment rules (major / minor / patch or date-based); update contributor guidance for version bumps; optionally add a docs check that validates version presence and format.
4) Rough estimate (calendar days): 3 h/day ~2 days (~6 h)

2 h/day ~3 days

1 h/day ~6 days

P0 — critical QA foundation: dedicated tester space, test process, and full testing pyramid (frontend + backend) To do

1) Summary: Create a dedicated QA/testing space and establish an end-to-end testing process for the project, covering both frontend and backend with a clear testing pyramid (unit, integration, API/contract, E2E, and smoke/regression).
2) Problem & value: Problem: Testing coverage is currently near zero for both frontend and backend, and there is no shared QA workflow, ownership model, or quality gate policy.
Value: Predictable release quality, earlier bug detection, lower regression risk, and a stable team process where engineers and testers work with one quality baseline.
3) Scope & deliverables: Define dedicated tester workspace and access model (test environment, test data, tooling); baseline quality strategy document with entry/exit criteria; target coverage and test pyramid ratios per layer; frontend and backend test suites with markers and ownership; CI gates for required checks; defect triage and bug lifecycle policy; release readiness checklist; onboarding guide for test practices and responsibilities.
4) Rough estimate (calendar days): 3 h/day ~14 days (~42 h)

2 h/day ~21 days

1 h/day ~42 days

P1 Mobile docs UX: fix top navigation rendering on iPhone 17 Pro Max and adaptive layout by screen size Done

1) Summary

Fix the broken/awkward top navigation rendering on iPhone 17 Pro Max and implement responsive/platform-aware documentation layout behavior so reading and navigation stay comfortable across small, medium, and large screens.

2) Problem & value

Problem: The top of the docs navigation currently looks poor on iPhone 17 Pro Max, reducing usability and perceived quality on mobile devices.
Value: Better mobile first impression, faster page navigation, and more predictable UX with screen-size-specific behavior.

3) Scope & deliverables

Audit current docs shell/navigation on iOS Safari (including safe-area and notch behavior); define responsive breakpoints and layout rules for nav/sidebar/content; implement platform-friendly patterns (env(safe-area-inset-*), sticky/fixed header behavior, compact navigation states, touch target sizing, and readable typography scale); verify on key viewport buckets and document the rules in internal docs.

4) Rough estimate (calendar days)

3 h/day ~4 days (~12 h)

2 h/day ~6 days

1 h/day ~12 days

5) Completion evidence / acceptance

Mobile title row uses safe-area aware top spacing via env(safe-area-inset-top).
Drawer panel uses safe-area top/bottom padding for notched devices.
Interactive controls in collapsed/compact states keep at least 44px touch target width.
Tablet/phone drawer mode hides desktop sidebar and keeps reliable open/close behavior.
Implemented behavior is documented in docs/internal/front/docs-frontend-menu-and-theme-controls.html.

Page history

Date	Change	Author
2026-04-21	Added backlog item for iPhone 17 Pro Max navigation fix and adaptive docs layout by screen size.	Ivan Boyarkin
2026-04-21	Added critical QA foundation backlog item for tester space and full frontend/backend testing process.	Ivan Boyarkin
2026-04-21	Added backlog item for document versioning and Document Version field policy.	Ivan Boyarkin
2026-04-21	Expanded Redis rate-limiting backlog item and added Redis idempotency/cache adoption item.	Ivan Boyarkin
2026-04-21	Added Page history section (repository baseline).	Ivan Boyarkin

Backlog: priorities and status