Aggregating Uniques in Snowflake
Published on .
WordPress database error: [<div style="clear:both"> </div><div class="queries" style="clear:both; margin_bottom:2px; border: red dotted thin;">Queries made or created this session were<br/>
<ol>
<li>Raw query:
SELECT * FROM wp_options WHERE </li>
<li>Rewritten:
SELECT * FROM wp_options WHERE </li>
<li>With Placeholders:
SELECT * FROM wp_options WHERE </li>
<li>Prepare:
SELECT * FROM wp_options WHERE </li>
</ol>
</div><div style="clear:both; margin_bottom:2px; border: red dotted thin;" class="error_message" style="border-bottom:dotted blue thin;">Error occurred at line 1644 in Function prepare_query. <br/> Error message was: Problem preparing the PDO SQL Statement. Error was: SQLSTATE[HY000]: General error: 1 incomplete input </div><pre>#0 /home/nattaylor/public_html/wordpress/wp-content/db.php(2746): WP_SQLite_DB\PDOEngine->get_error_message()
#1 /home/nattaylor/public_html/wordpress/wp-content/db.php(3484): WP_SQLite_DB\wpsqlitedb->query('...')
#2 /home/nattaylor/public_html/wordpress/wp-content/db.php(2952): WP_SQLite_DB\PDOSQLiteDriver->execute_duplicate_key_update()
#3 /home/nattaylor/public_html/wordpress/wp-content/db.php(1893): WP_SQLite_DB\PDOSQLiteDriver->rewrite_query('...', '...')
#4 /home/nattaylor/public_html/wordpress/wp-content/db.php(1357): WP_SQLite_DB\PDOEngine->execute_insert_query_new('...')
#5 /home/nattaylor/public_html/wordpress/wp-content/db.php(2739): WP_SQLite_DB\PDOEngine->query('...')
#6 /home/nattaylor/public_html/wordpress/wp-includes/option.php(1143): WP_SQLite_DB\wpsqlitedb->query('...')
#7 /home/nattaylor/public_html/wordpress/wp-includes/option.php(1552): add_option('...', 1758134601, '', '...')
#8 /home/nattaylor/public_html/wordpress/wp-content/plugins/syntax-highlighting-code-block/inc/functions.php(671): set_transient('...', Array, 2592000)
#9 /home/nattaylor/public_html/wordpress/wp-includes/class-wp-block.php(586): Syntax_Highlighting_Code_Block\render_block(Array, '...', Object(WP_Block))
#10 /home/nattaylor/public_html/wordpress/wp-includes/blocks.php(2359): WP_Block->render()
#11 /home/nattaylor/public_html/wordpress/wp-includes/blocks.php(2431): render_block(Array)
#12 /home/nattaylor/public_html/wordpress/wp-includes/class-wp-hook.php(324): do_blocks('...')
#13 /home/nattaylor/public_html/wordpress/wp-includes/plugin.php(205): WP_Hook->apply_filters('...', Array)
#14 /home/nattaylor/public_html/wordpress/wp-includes/post-template.php(256): apply_filters('...', '...')
#15 /home/nattaylor/public_html/wordpress/wp-content/themes/ntdc/index.php(70): the_content()
#16 /home/nattaylor/public_html/wordpress/wp-includes/template-loader.php(106): include('...')
#17 /home/nattaylor/public_html/wordpress/wp-blog-header.php(19): require_once('...')
#18 /home/nattaylor/public_html/wordpress/index.php(17): require('...')
</pre>]
SELECT * FROM wp_options WHERE
WordPress database error: [<div style="clear:both"> </div><div class="queries" style="clear:both; margin_bottom:2px; border: red dotted thin;">Queries made or created this session were<br/>
<ol>
<li>Raw query:
SELECT * FROM wp_options WHERE </li>
<li>Rewritten:
SELECT * FROM wp_options WHERE </li>
<li>With Placeholders:
SELECT * FROM wp_options WHERE </li>
<li>Prepare:
SELECT * FROM wp_options WHERE </li>
</ol>
</div><div style="clear:both; margin_bottom:2px; border: red dotted thin;" class="error_message" style="border-bottom:dotted blue thin;">Error occurred at line 1644 in Function prepare_query. <br/> Error message was: Problem preparing the PDO SQL Statement. Error was: SQLSTATE[HY000]: General error: 1 incomplete input </div><pre>#0 /home/nattaylor/public_html/wordpress/wp-content/db.php(2746): WP_SQLite_DB\PDOEngine->get_error_message()
#1 /home/nattaylor/public_html/wordpress/wp-content/db.php(3484): WP_SQLite_DB\wpsqlitedb->query('...')
#2 /home/nattaylor/public_html/wordpress/wp-content/db.php(2952): WP_SQLite_DB\PDOSQLiteDriver->execute_duplicate_key_update()
#3 /home/nattaylor/public_html/wordpress/wp-content/db.php(1893): WP_SQLite_DB\PDOSQLiteDriver->rewrite_query('...', '...')
#4 /home/nattaylor/public_html/wordpress/wp-content/db.php(1357): WP_SQLite_DB\PDOEngine->execute_insert_query_new('...')
#5 /home/nattaylor/public_html/wordpress/wp-content/db.php(2739): WP_SQLite_DB\PDOEngine->query('...')
#6 /home/nattaylor/public_html/wordpress/wp-includes/option.php(1143): WP_SQLite_DB\wpsqlitedb->query('...')
#7 /home/nattaylor/public_html/wordpress/wp-includes/option.php(1554): add_option('...', Array, '', '...')
#8 /home/nattaylor/public_html/wordpress/wp-content/plugins/syntax-highlighting-code-block/inc/functions.php(671): set_transient('...', Array, 2592000)
#9 /home/nattaylor/public_html/wordpress/wp-includes/class-wp-block.php(586): Syntax_Highlighting_Code_Block\render_block(Array, '...', Object(WP_Block))
#10 /home/nattaylor/public_html/wordpress/wp-includes/blocks.php(2359): WP_Block->render()
#11 /home/nattaylor/public_html/wordpress/wp-includes/blocks.php(2431): render_block(Array)
#12 /home/nattaylor/public_html/wordpress/wp-includes/class-wp-hook.php(324): do_blocks('...')
#13 /home/nattaylor/public_html/wordpress/wp-includes/plugin.php(205): WP_Hook->apply_filters('...', Array)
#14 /home/nattaylor/public_html/wordpress/wp-includes/post-template.php(256): apply_filters('...', '...')
#15 /home/nattaylor/public_html/wordpress/wp-content/themes/ntdc/index.php(70): the_content()
#16 /home/nattaylor/public_html/wordpress/wp-includes/template-loader.php(106): include('...')
#17 /home/nattaylor/public_html/wordpress/wp-blog-header.php(19): require_once('...')
#18 /home/nattaylor/public_html/wordpress/index.php(17): require('...')
</pre>]
SELECT * FROM wp_options WHERE
In pursuit of fast queries, we often want to pre-compute aggregated metrics, but this can be a challenge with counting unique values since they can’t be summed (e.g. today’s unique count + yesterday’s unique count cannot be deduplicated). How can a data model support daily uniques and monthly uniques without storing a list of all the unique values?
Well, if perfect accuracy isn’t a requirement, then we can use HyperLogLog++ to pre-compute the accumulated state, then combine + estimate at query time.
For example, given an event level table that we want to aggregate a daily data model from which we can also calculate uniques, we’d do something like the following:
-- Event Level table
select *
from values
('2023-01-01', 'user1'),
('2023-01-01', 'user2') as t(date, user_id)
-- Data Model with daily aggregation
select date, hll_accumulate(user_id) as hll_a from events group by all
-- Example query with monthly aggregation
select date_trunc(month, date) as period, hll_estimate(hll_combine(hll_a)) as unique_users
from events_daily
group by all
This is fast and HLL++ is accurate to within 1-2% even for small cardinality! Problem solved.