When a reader types "how do I set up single sign-on" into your knowledge base search, they expect to find your SAML Configuration Guide โ even though the words don't match. Traditional keyword search fails here. Semantic search doesn't.
This post explains exactly how FinalDoc's AI-powered search works, from embedding generation to query-time ranking.
The Embedding Pipeline
When you publish an article, FinalDoc runs a background process:
- Chunking โ the article is split into overlapping chunks of 500-1000 tokens. Overlap ensures we don't lose context at chunk boundaries.
- Embedding โ each chunk is sent to OpenAI's
text-embedding-3-smallmodel, which returns a 1536-dimensional vector - Storage โ vectors are stored in PostgreSQL using the pgvector extension, alongside the chunk text and article metadata
The embedding captures the meaning of the text, not just the words. "SSO setup" and "single sign-on configuration" produce nearly identical vectors because they mean the same thing.
Hybrid Search
Pure semantic search is powerful but not perfect. It can miss exact term matches โ if someone searches for an error code like ERR_AUTH_FAILED, semantic similarity might not rank the right article first.
FinalDoc uses a hybrid approach that combines three search methods:
1. Semantic Search (pgvector)
Convert the query to an embedding, then find the nearest article chunks using cosine similarity:
SELECT * FROM article_chunks ORDER BY embedding <=> query_embedding LIMIT 10
This finds conceptually related content regardless of keyword overlap.
2. Full-Text Search (PostgreSQL tsvector)
Standard PostgreSQL full-text search with ranking. Handles exact matches, stemming, and phrase queries:
SELECT * FROM articles WHERE search_vector @@ plainto_tsquery('english', query)
3. Fuzzy Search (pg_trgm)
Trigram similarity for typo tolerance. When a user types "authenication" instead of "authentication," fuzzy search still finds the right articles.
Scoring and Ranking
Results from all three methods are merged using a weighted scoring formula:
- Semantic similarity โ 50% weight (highest, because intent matching is most valuable)
- Full-text relevance โ 35% weight (exact matches should rank high)
- Fuzzy similarity โ 15% weight (catches typos and near-matches)
Articles that appear in multiple result sets get boosted. An article that's both semantically similar AND contains the exact keywords is almost certainly the right result.
Performance
Search must be fast. Readers expect results as they type. Our targets:
- < 200ms for semantic search on 10,000 article chunks
- < 50ms for full-text search with GIN index
- < 100ms for fuzzy search with trigram index
pgvector uses HNSW (Hierarchical Navigable Small World) indexes for approximate nearest neighbor search. This gives us O(log n) query time instead of O(n) brute-force comparison.
We also use Redis caching for repeat queries. The same search query returns cached results for 60 seconds, eliminating database hits for common searches.
Smart Suggestions
When all three search methods return zero results, FinalDoc doesn't show an empty page. Instead, the AI generates smart suggestions:
- The zero-result query is embedded and compared against all article chunks at a lower similarity threshold
- The top 3 loosely-related articles are shown as "You might be looking for..."
- The AI chatbot offers to answer the question directly: "Can't find what you need? Ask me!"
This turns a dead-end into an engagement opportunity. Zero-result queries are also logged for your content team โ they're signals that you need to write new articles.
Embedding Management
Embeddings need to stay in sync with content. FinalDoc handles this automatically:
- On publish โ new article โ generate embeddings
- On update โ changed article โ regenerate embeddings for modified chunks
- On delete โ removed article โ delete associated embeddings
- Bulk embed โ admin command to regenerate all embeddings (useful after initial import)
You can monitor embedding status in Settings โ AI Configuration, which shows the total number of embedded chunks and the last sync timestamp.
Privacy
For teams using Private AI (BYOK), the embedding model also runs on your infrastructure. Deploy text-embedding-3-small on Azure OpenAI or use Amazon Titan Embeddings on AWS Bedrock. Your article content never leaves your cloud, even for search indexing.