API
Search endpoint
The integration injects one on-demand route tree (default /api/ask). The base
route serves the overlay: keyword mode returns JSON; agentic mode streams
a grounded answer as Server-Sent Events (text/event-stream). Keyless
sub-routes expose the committed knowledge graph for CLIs, MCP servers, and
generated clients.
The OpenAPI 3.1 contract is published at /openapi.yaml.
Suggested questions (GET)
GET /api/ask returns the knowledge graph’s baked-in suggestions and the loop
model — no query, no model call:
{
"suggestions": ["How does the knowledge graph stay fresh?"],
"model": "claude-haiku-4-5"
}
The overlay fetches this once on first open (when AI is on) to populate its
suggested questions. An empty suggestions array — a graph without them, or no
graph at all — just means the overlay shows none.
Knowledge graph reads (GET)
These routes read virtual:hev-ask/kg, never call a model, and never require an
API key:
| Route | Response |
|---|---|
GET /api/ask/glossary | { "terms": GlossaryEntry[] } |
GET /api/ask/glossary/{term} | one GlossaryEntry, matched by term or alias |
GET /api/ask/sections | { "sections": SectionSummary[] } |
GET /api/ask/sections?group=API | section summaries filtered by group |
GET /api/ask/sections/{id} | one full KnowledgeNode |
GET /api/ask/overview | { "overview": string, "context": string } |
A SectionSummary is the lightweight shape { id, title, heading, group, url }.
For section IDs that contain / or #, URL-encode the ID when placing it in
the path, for example /api/ask/sections/api%2Fcli%23flags.
Missing glossary terms, section IDs, or unknown read routes return 404 with a
JSON error:
{ "error": "Not found." }
Request
POST with a JSON body:
{
"query": "how does autoscaling work",
"mode": "agentic"
}
| Field | Type | Description |
|---|---|---|
query | string | The search query. Empty or whitespace returns an empty result set. |
mode | 'keyword' | 'agentic' | Optional. keyword forces the instant path; agentic requests the loop. Omitted behaves like agentic when a key is present. |
Keyword response (JSON)
Keyword mode returns a 200 JSON envelope:
{
"results": [
{
"title": "Concepts",
"heading": "The agentic search loop",
"url": "/docs/concepts#the-agentic-search-loop",
"group": "Overview",
"snippet": "When the reader presses Enter, the query goes to a bounded loop…"
}
],
"query": "how does agentic search work",
"model": "claude-haiku-4-5",
"mode": "keyword"
}
| Field | Type | Description |
|---|---|---|
results | Result[] | Ranked keyword matches (title, heading?, url, group?, snippet). |
query | string | Echoed back. |
model | string | The configured loop model. |
mode | 'keyword' | The mode that ran. |
warning | string? | Present when agentic was requested but no key is configured (downgrade). |
The url field carries the deep link — the page URL with #anchor appended for
a section, absent only for a document’s intro chunk.
Agentic response (SSE)
When a key is present and mode is agentic, the endpoint responds with
content-type: text/event-stream and streams the answer as it is generated.
Each event is a named SSE frame:
event: search
data: {"query":"autoscaling"}
event: sources
data: {"sources":[{"title":"Core Concepts","heading":"Kubernetes autoscaling","url":"/docs/concepts#kubernetes-autoscaling","group":"Overview"}],"model":"claude-haiku-4-5","mode":"agentic"}
event: token
data: {"text":"Autoscaling scales workers based on "}
event: token
data: {"text":"lag signals. See [autoscaling](/docs/concepts#kubernetes-autoscaling)."}
event: done
data: {}
| Event | Data | Meaning |
|---|---|---|
search | { query } | Context the model gathered — a search sub-query, or the heading of a section it opened. May fire several times. |
sources | { sources: Source[], model, mode } | The grounding source set, sent once before any token. Clients validate answer links against it. |
token | { text } | One delta of the streamed Markdown answer. |
done | {} | The stream is complete. |
error | { error } | A failure that occurred after streaming began (HTTP status is already 200). |
A Source is { title, heading?, url, group? } — note there is no snippet;
the answer prose carries the substance, and links point at url.
Mode selection
The endpoint decides what to run:
- Empty query →
{ results: [], query: "", model, mode: "keyword" }(JSON). mode: "keyword", or no API key → keyword JSON,mode: "keyword".mode: "agentic"but no key → keyword JSON plus awarning, andmode: "keyword".- otherwise → the agentic SSE stream.
Errors
| Status | Body | Cause |
|---|---|---|
400 | { "error": "Invalid JSON body." } | The request body wasn’t valid JSON. |
404 | { "error": "…" } | A knowledge-graph read route, glossary term, or section ID wasn’t found. |
500 | { "error": "…" } | The chunk index failed to build (e.g. a misconfigured collection). |
| — | event: error | A failure during the agentic stream. The HTTP status is already 200, so errors arrive as a final SSE error event rather than a status code. |
The API key
The endpoint resolves ANTHROPIC_API_KEY from, in order: the adapter runtime
env (locals.runtime.env, e.g. Cloudflare), process.env, then
import.meta.env. Set it wherever your host injects server secrets; it is never
sent to the browser.
LLM tracing
Set POSTHOG_KEY (or POSTHOG_API_KEY) in the same environment and every
agentic answer emits a PostHog $ai_generation trace — model, tokens, latency,
and the loop’s tool calls. POSTHOG_HOST overrides the US-cloud ingestion host,
and POSTHOG_CAPTURE_CONTENT (off | redacted | full, default full)
controls how much prompt and answer text ships with each event. No key → no-op;
the answer path never depends on telemetry.
Index lifecycle
The chunk index is built once per server instance on the first request and
cached for the process lifetime. On the first request the endpoint also compares
the live content hash against the knowledge graph’s hash and logs a one-time
warning if they differ — your cue to run ask kg build.