Test Drive: Hybrid Full-text Search
Published on .
Today I test drove Hybrid full-text search with sqlite-vec. I’ve always been interested in information retrieval but I haven’t yet worked on hybrid search that combines vector similarity with keyword based. My task is to find similar products to a search query.
- Install sqlite-vec and sqlite-lembed
- Download a GGUF embedding model (I used huggingface-cli)
- Get some data (I used WANDS)
- Open the database connection and load the extensions
- Create tables & insert data
- Run queries!
The SQL is below. I hit several walls on this so my advice is: 1) don’t try to compile the extensions from scratch 2) don’t pass input longer than 512 tokens (see sqlite-lembed#6) and 3) plan to wait about 20 seconds per 500 embeddings to be generated. There’s tons more code but I’ve omitted it since it’s just munging data and getting the database set up.
with fts_matches as (
select
rowid as product_id,
row_number() over (order by rank) as rank_number,
rank as score
from fts_products
where fts_products match (:q)
limit 10
),
--- sqlite-vec KNN vector search results
vec_matches as (
select
product_id,
row_number() over (order by distance) as rank_number,
distance
from vec_products
where
product_embedding match lembed(:q)
and k = 10
order by distance
),
-- combining FTS5 + vector search results, FTS comes first
kwf as (
select 'fts' as match_type, * from fts_matches
union all
select 'vec' as match_type, * from vec_matches
),
-- JOIN back to the contents
kwf_final as (
select
products.product_id,
products.product_name,
kwf.*
from kwf
left join products on products.rowid = kwf.product_id
),
rrf as (select
products.product_id,
products.product_name,
vec_matches.rank_number as vec_rank,
fts_matches.rank_number as fts_rank,
-- RRF algorithm
(
coalesce(1.0 / (60 + fts_matches.rank_number), 0.0) * 1.0 +
coalesce(1.0 / (60 + vec_matches.rank_number), 0.0) * 1.0
) as combined_rank,
vec_matches.distance as vec_distance,
fts_matches.score as fts_score
from fts_matches
full outer join vec_matches on vec_matches.product_id = fts_matches.product_id
join products on products.rowid = coalesce(fts_matches.product_id, vec_matches.product_id)
order by combined_rank desc),
rerank as (
select
products.product_id,
products.product_name,
fts_matches.*
from fts_matches
left join products on products.rowid = fts_matches.product_id
order by vec_distance_cosine(lembed(:q), lembed(products.product_name))
)
select * from rerank;