RFC 0001: Documentation Search Implementation

Metadata

Goals and non-goals

Goals

Non-goals

Architecture overview

  1. The build writes docs/assets/search-index.json from docs/**/*.html.
  2. The browser loads the index once; ranking runs in JavaScript at query time.
  3. The UI shows the top results and supports keyboard navigation.
  4. The browser sends telemetry events to the API.
  5. The API appends rows to SQLite table docs_search_events.
  6. The API exposes KPI snapshots at /internal/telemetry/docs-search/metrics.

Build-time indexing

Implemented in scripts/build_docs_search_index.py.

Input corpus

Extraction and normalization

Index schema (version 2)

Query-time ranking

Implemented in docs/assets/docs-nav.js.

Core formula

score_base(d, T) = Σ over t in T:
        idf(t) * (
        8.0 * log(1 + tf_title(t, d))
        + 4.0 * log(1 + tf_url(t, d))
        + 2.0 * log(1 + tf_section(t, d))
        + 1.4 * log(1 + tf_content(t, d))
        )

        idf(t) = log(1 + (N + 1) / (df(t) + 0.5))

        norm(d) = 1 / (1 + 0.08 * max(0, content_len(d) / avg_content_len - 1))

        score_norm = score_base * norm

Precision boosts

Prefix expansion policy

Telemetry implementation

Client event types

Transport behavior

Server-side ingestion and storage

Canonical KPI definitions

In a rolling time window:

Metrics endpoint: GET /internal/telemetry/docs-search/metrics?window_minutes=<N>.

Local validation playbook

  1. Rebuild the search index:
    python3 scripts/build_docs_search_index.py
  2. Serve docs over HTTP (not file://):
    cd docs
                  python3 -m http.server 8765
  3. Run the API:
    make run
  4. Open http://127.0.0.1:8765/index.html, run searches, click a result.
  5. In the browser network tab, confirm GET /assets/search-index.json returns 200 or 304.
  6. Check the database:
    sqlite3 study_app.db "select id,event,session_id,query_id,datetime(emitted_at_ms/1000,'unixepoch','localtime') from docs_search_events order by id desc limit 20;"
  7. Check the metrics API:
    curl "http://127.0.0.1:8000/internal/telemetry/docs-search/metrics?window_minutes=60"

Troubleshooting

Could not load search index

Telemetry preflight returns 200 but the DB has no rows

Change management

Page history

Date Change Author
Added Page history section (repository baseline). Ivan Boyarkin