Superset on Databricks
We have data on S3 and SQL tables on it in Databricks, so I wanted to connect Superset for visualizing the data. Thanks to the databricks-dbapi project, it turns out to be as simple as
pip install databricks-dbapi then
pip install databricks-dbapi[sqlalchemy] and configuring a new Superset > Source > Database > SQLAlchemy URI to foo
Just keep in mind that:
- Tokens are only available when you create them in Databricks. The “Token ID” shown on the “Access Tokens” page is just an ID, not the token itself.
- cluster_id is in the middle of the cluster config url (
- You need to restart Superset after you install the packages
- Queries will be slow if they have to scan a lot of data, so consider partitioning on date and then restricting to just a few days.
- You may use any SparkSQL built-in function like