Test Drive: fastdata
Published on .
Today I’m test driving fastdata “a minimal library for generating synthetic data for training deep learning models.” Recently I finetuned a model to be spooky, so today my task will be generating similar spooky descriptions of terms. The process is simple:
pip install python-fastdata
- Set your ANTHROPIC_API_KEY
- Define a data model
- Prepare your inputs
- Generate data!
fastdata only works with Claude currently and implements Claudette which uses tool calling to get back structured data. A call to fastdata.generate()
looks like below. They have a handy crituque example to, to evaluate the generated data. It does a nice job formatting the output, too. Overall, it’s a great little tool!
{
'method': 'post',
'url': '/v1/messages',
'timeout': 600,
'files': None,
'json_data': {
'max_tokens': 4096,
'messages': [{
'role': 'user',
'content': [{
'type': 'text',
'text': 'Generate a spooky description of the item with adjectives like spooky and haunting. Compare it to ghouls, ghosts or witches. The topic is:\n<topic>work</topic>\n'
}]
}],
'model': 'claude-3-haiku-20240307',
'system': '',
'temperature': 1.0,
'tool_choice': {
'type': 'any'
},
'tools': [{
'name': 'Spook',
'description': 'Generate a spooky description of the item with adjectives like spooky and haunting. Compare it to ghouls, ghosts or witches.',
'input_schema': {
'type': 'object',
'properties': {
'topic': {
'type': 'string',
'description': ''
},
'spookify': {
'type': 'string',
'description': ''
}
},
'required': ['topic', 'spookify']
}
}]
}
}
Here’s my implementation
%env ANTHROPIC_API_KEY=sk-ant-api03-foo
%env ANTHROPIC_LOG=debug
import os
from textwrap import dedent
import logging
import requests
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
url = 'https://gist.githubusercontent.com/creikey/42d23d1eec6d764e8a1d9fe7e56915c6/raw/b07de0068850166378bc3b008f9b655ef169d354/top-1000-nouns.txt'
words = requests.get(url).text.split("\n")
from fastcore.utils import *
from fastdata.core import FastData
class Spook():
"Generate a spooky description of the item with adjectives like spooky and haunting. Compare it to ghouls, ghosts or witches."
def __init__(self, topic: str, spookify: str): store_attr()
def __repr__(self): return f"{self.topic} ➡ *{self.spookify}*"
prompt_template = """\
Generate a spooky description of the item with adjectives like spooky and haunting. Compare it to ghouls, ghosts or witches. The topic is:
<topic>{topic}</topic>
"""
inputs = [{"topic":topic} for topic in words[:5]]
fast_data = FastData(model="claude-3-haiku-20240307")
spooks = fast_data.generate(
prompt_template=prompt_template,
inputs=inputs,
schema=Spook,
)
from IPython.display import Markdown
Markdown("\n".join(f'- {t}' for t in spooks))
def to_md(ss): return '\n'.join(f'- {s}' for s in ss)
def show(ss): return Markdown(to_md(ss))
class SpookCritique():
"A critique of the spok."
def __init__(self, critique: str, spookiness: str): store_attr()
def __repr__(self): return f"\t- **Critique:** {self.critique}\n\t- **Spookiness:** {self.spookiness}"
sp = "You will help critique synthetic data of spooky passages."
critique_template = dedent("""\
Below is an extract of a spook. Evaluate its spookiness as a Halloween enthusiast would, considering its suitability for spooktacular use:
- EXTREME if it would spook an adult
- HIGH if it would spook a teenager
- MEDIUM if it would spook a child
- LOW if it is not very spooky
{spook}
After examining the spook:
- Briefly justify your spookiness rating in a setence
- Rate the spookiness as one of: EXTREME, HIGH, MEDIUM, LOW
""")
fast_data = FastData(model="claude-3-5-sonnet-20240620")
critiques = fast_data.generate(
prompt_template=critique_template,
inputs=[{"spook": f"{t.topic} -> {t.spookify}"} for t in spooks],
schema=SpookCritique,
sp=sp
)
show(f'{t}\n\n{c}' for t, c in zip(spooks, critiques))
The output looks like this: