Nat TaylorBlog, AI, Product Management & Tinkering

Test Drive: Hybrid Full-text Search

Published on .

Today I test drove Hybrid full-text search with sqlite-vec. I’ve always been interested in information retrieval but I haven’t yet worked on hybrid search that combines vector similarity with keyword based. My task is to find similar products to a search query.

  1. Install sqlite-vec and sqlite-lembed
  2. Download a GGUF embedding model (I used huggingface-cli)
  3. Get some data (I used WANDS)
  4. Open the database connection and load the extensions
  5. Create tables & insert data
  6. Run queries!

The SQL is below. I hit several walls on this so my advice is: 1) don’t try to compile the extensions from scratch 2) don’t pass input longer than 512 tokens (see sqlite-lembed#6) and 3) plan to wait about 20 seconds per 500 embeddings to be generated. There’s tons more code but I’ve omitted it since it’s just munging data and getting the database set up.

with fts_matches as (
  select
    rowid as product_id,
    row_number() over (order by rank) as rank_number,
    rank as score
  from fts_products
  where fts_products match (:q)
  limit 10
),
--- sqlite-vec KNN vector search results
vec_matches as (
  select
    product_id,
    row_number() over (order by distance) as rank_number,
    distance
  from vec_products
  where
    product_embedding match lembed(:q)
    and k = 10
  order by distance
),
-- combining FTS5 + vector search results, FTS comes first
kwf as (
  select 'fts' as match_type, * from fts_matches
  union all
  select 'vec' as match_type, * from vec_matches
),
-- JOIN back to the contents
kwf_final as (
  select
    products.product_id,
    products.product_name,
    kwf.*
  from kwf
  left join products on products.rowid = kwf.product_id
),
rrf as (select
    products.product_id,
    products.product_name,
    vec_matches.rank_number as vec_rank,
    fts_matches.rank_number as fts_rank,
    -- RRF algorithm
    (
      coalesce(1.0 / (60 + fts_matches.rank_number), 0.0) * 1.0 +
      coalesce(1.0 / (60 + vec_matches.rank_number), 0.0) * 1.0
    ) as combined_rank,
    vec_matches.distance as vec_distance,
    fts_matches.score as fts_score
  from fts_matches
  full outer join vec_matches on vec_matches.product_id = fts_matches.product_id
  join products on products.rowid = coalesce(fts_matches.product_id, vec_matches.product_id)
  order by combined_rank desc),
rerank as (
  select
    products.product_id,
    products.product_name,
    fts_matches.*
  from fts_matches
  left join products on products.rowid = fts_matches.product_id
  order by vec_distance_cosine(lembed(:q), lembed(products.product_name))
)
select * from rerank;

Post Navigation

«
»