Nat TaylorBlog, AI, Product Management & Tinkering

Test Drive: tokenizers

Published on .

WordPress database error: [<div style="clear:both">&nbsp;</div><div class="queries" style="clear:both; margin_bottom:2px; border: red dotted thin;">Queries made or created this session were<br/> <ol> <li>Raw query: SELECT * FROM wp_options WHERE </li> <li>Rewritten: SELECT * FROM wp_options WHERE </li> <li>With Placeholders: SELECT * FROM wp_options WHERE </li> <li>Prepare: SELECT * FROM wp_options WHERE </li> </ol> </div><div style="clear:both; margin_bottom:2px; border: red dotted thin;" class="error_message" style="border-bottom:dotted blue thin;">Error occurred at line 1644 in Function prepare_query. <br/> Error message was: Problem preparing the PDO SQL Statement. Error was: SQLSTATE[HY000]: General error: 1 incomplete input </div><pre>#0 /home/nattaylor/public_html/wordpress/wp-content/db.php(2746): WP_SQLite_DB\PDOEngine->get_error_message() #1 /home/nattaylor/public_html/wordpress/wp-content/db.php(3484): WP_SQLite_DB\wpsqlitedb->query('...') #2 /home/nattaylor/public_html/wordpress/wp-content/db.php(2952): WP_SQLite_DB\PDOSQLiteDriver->execute_duplicate_key_update() #3 /home/nattaylor/public_html/wordpress/wp-content/db.php(1893): WP_SQLite_DB\PDOSQLiteDriver->rewrite_query('...', '...') #4 /home/nattaylor/public_html/wordpress/wp-content/db.php(1357): WP_SQLite_DB\PDOEngine->execute_insert_query_new('...') #5 /home/nattaylor/public_html/wordpress/wp-content/db.php(2739): WP_SQLite_DB\PDOEngine->query('...') #6 /home/nattaylor/public_html/wordpress/wp-includes/option.php(1143): WP_SQLite_DB\wpsqlitedb->query('...') #7 /home/nattaylor/public_html/wordpress/wp-includes/option.php(1552): add_option('...', 1758065878, '', '...') #8 /home/nattaylor/public_html/wordpress/wp-content/plugins/syntax-highlighting-code-block/inc/functions.php(671): set_transient('...', Array, 2592000) #9 /home/nattaylor/public_html/wordpress/wp-includes/class-wp-block.php(586): Syntax_Highlighting_Code_Block\render_block(Array, '...', Object(WP_Block)) #10 /home/nattaylor/public_html/wordpress/wp-includes/blocks.php(2359): WP_Block->render() #11 /home/nattaylor/public_html/wordpress/wp-includes/blocks.php(2431): render_block(Array) #12 /home/nattaylor/public_html/wordpress/wp-includes/class-wp-hook.php(324): do_blocks('...') #13 /home/nattaylor/public_html/wordpress/wp-includes/plugin.php(205): WP_Hook->apply_filters('...', Array) #14 /home/nattaylor/public_html/wordpress/wp-includes/post-template.php(256): apply_filters('...', '...') #15 /home/nattaylor/public_html/wordpress/wp-content/themes/ntdc/index.php(70): the_content() #16 /home/nattaylor/public_html/wordpress/wp-includes/template-loader.php(106): include('...') #17 /home/nattaylor/public_html/wordpress/wp-blog-header.php(19): require_once('...') #18 /home/nattaylor/public_html/wordpress/index.php(17): require('...') </pre>]
SELECT * FROM wp_options WHERE

WordPress database error: [<div style="clear:both">&nbsp;</div><div class="queries" style="clear:both; margin_bottom:2px; border: red dotted thin;">Queries made or created this session were<br/> <ol> <li>Raw query: SELECT * FROM wp_options WHERE </li> <li>Rewritten: SELECT * FROM wp_options WHERE </li> <li>With Placeholders: SELECT * FROM wp_options WHERE </li> <li>Prepare: SELECT * FROM wp_options WHERE </li> </ol> </div><div style="clear:both; margin_bottom:2px; border: red dotted thin;" class="error_message" style="border-bottom:dotted blue thin;">Error occurred at line 1644 in Function prepare_query. <br/> Error message was: Problem preparing the PDO SQL Statement. Error was: SQLSTATE[HY000]: General error: 1 incomplete input </div><pre>#0 /home/nattaylor/public_html/wordpress/wp-content/db.php(2746): WP_SQLite_DB\PDOEngine->get_error_message() #1 /home/nattaylor/public_html/wordpress/wp-content/db.php(3484): WP_SQLite_DB\wpsqlitedb->query('...') #2 /home/nattaylor/public_html/wordpress/wp-content/db.php(2952): WP_SQLite_DB\PDOSQLiteDriver->execute_duplicate_key_update() #3 /home/nattaylor/public_html/wordpress/wp-content/db.php(1893): WP_SQLite_DB\PDOSQLiteDriver->rewrite_query('...', '...') #4 /home/nattaylor/public_html/wordpress/wp-content/db.php(1357): WP_SQLite_DB\PDOEngine->execute_insert_query_new('...') #5 /home/nattaylor/public_html/wordpress/wp-content/db.php(2739): WP_SQLite_DB\PDOEngine->query('...') #6 /home/nattaylor/public_html/wordpress/wp-includes/option.php(1143): WP_SQLite_DB\wpsqlitedb->query('...') #7 /home/nattaylor/public_html/wordpress/wp-includes/option.php(1554): add_option('...', Array, '', '...') #8 /home/nattaylor/public_html/wordpress/wp-content/plugins/syntax-highlighting-code-block/inc/functions.php(671): set_transient('...', Array, 2592000) #9 /home/nattaylor/public_html/wordpress/wp-includes/class-wp-block.php(586): Syntax_Highlighting_Code_Block\render_block(Array, '...', Object(WP_Block)) #10 /home/nattaylor/public_html/wordpress/wp-includes/blocks.php(2359): WP_Block->render() #11 /home/nattaylor/public_html/wordpress/wp-includes/blocks.php(2431): render_block(Array) #12 /home/nattaylor/public_html/wordpress/wp-includes/class-wp-hook.php(324): do_blocks('...') #13 /home/nattaylor/public_html/wordpress/wp-includes/plugin.php(205): WP_Hook->apply_filters('...', Array) #14 /home/nattaylor/public_html/wordpress/wp-includes/post-template.php(256): apply_filters('...', '...') #15 /home/nattaylor/public_html/wordpress/wp-content/themes/ntdc/index.php(70): the_content() #16 /home/nattaylor/public_html/wordpress/wp-includes/template-loader.php(106): include('...') #17 /home/nattaylor/public_html/wordpress/wp-blog-header.php(19): require_once('...') #18 /home/nattaylor/public_html/wordpress/index.php(17): require('...') </pre>]
SELECT * FROM wp_options WHERE

Today I’m test driving tokenizers which is how text is split into tokens to be processed by language models. It’s full of quirks and there are several different approaches and my test drive is inspired by “You Should Probably Pay Attention to Tokenizers“. My task for today is to tokenize the text “the quick brown fox jumped over the fence” with 2 tokenizers. Here’s the code:

import sentence_transformers
import tiktoken

model      = sentence_transformers.SentenceTransformer("all-MiniLM-L6-v2")
tokenized  = model.tokenize(["the quick from fox jumped over the fence"])
tokens     = model.tokenizer.convert_ids_to_tokens(tokenized["input_ids"][0])
print(tokens)
# ['[CLS]', 'the', 'quick', 'from', 'fox', 'jumped', 'over', 'the', 'fence', '[SEP]']

model      = tiktoken.encoding_for_model("gpt-4o-mini")
tokenized  = model.encode("the quick from fox jumped over the fence")
tokens     = [model.decode_single_token_bytes(number) for number in tokenized]
print(tokens)
# [b'the', b' quick', b' from', b' fox', b' jumped', b' over', b' the', b' fence']

This particular example is boring, but if you add emoji or trailing whitespace it gets more interesting!

Post Navigation

«
»