Test Drive: Local Logprobs by Nat Taylor

Test Drive: Local Logprobs

Published on Nov 26, 2024.

Today I’m test driving logprobs locally, since I had a hard time finding a good example and it was harder than I thought since some tools (e.g. ollama) do not support logprobs. My task is inspired by a presentation at last night’s AI Tinkerers – Boston where nader karayanni talked about his project to help label data. His project sort of:

Prompts an LLM to label a dataset, storing the label and logits
Sort the data by the logits
Priorizitize the lowest probability labels for human labelers

In other words, given some input like “54YOM FELL WHILE RIDING SCOOTER/ MOTORCYCLE WAS GETTING GAS WHEN IT FELL ON HIM ?helmet DX: FOOT CONTUSION, FOOT INJURY” try to determine if a helmet was used (Yes, No, Unknown) and then have a human labeler correct places where the model might struggle like this example.

I thought this was very cool and wanted to give it a try LOCALLY, so I came up with the following. In short:

Load the dataset
Load the model and tokenizer (Qwen2.5-0.5B really struggled, but 1.5B was Ok)
Implement a function that prompts the model and gets back the logits
Loop over the dataset and sort the results

Nader’s project goes on to implement a cool Streamlit UI – check it out https://github.com/karayanni/StructurEase

I found that ollama does not support logprobs (see https://github.com/ollama/ollama/pull/1640). llama.cpp supports them in llama-server, which took me way to long to figure out. Transformers does, but as you can see below it’s a bit confusing.

I noticed that the probability depends highly on the temperature, but I claim that it doesn’t matter since the resulting probabilities can still be sorted.

import csv

# https://github.com/karayanni/StructurEase/blob/main/Evaluation/NEISS%20data/neiss_2023_filtered_unlabeled.csv
with open('neiss_2023_filtered_unlabeled.csv', mode='r') as csvfile:
    csv_reader = csv.reader(csvfile)
    next(csv_reader, None)
    cases = [{'case': row[0], 'incident': row[21]} for row in csv_reader]

from transformers import AutoTokenizer, AutoModelForCausalLM
from textwrap import dedent
import numpy as np

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")

def probability(incident: str) -> dict:
    messages = [
        {"role": "system", "content": dedent(f"""\
                Based on user's incident, was patient helmeted?
                ONLY OUTPUT Yes , No , or Unknown
                """)},
        {"role": "user", "content": incident}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    inputs = tokenizer([text], return_tensors="pt").to(model.device)

    outputs = model.generate(**inputs, max_new_tokens=15, return_dict_in_generate=True, output_scores=True)
    transition_scores = model.compute_transition_scores(
        outputs.sequences, outputs.scores, normalize_logits=True
    )

    input_length = inputs.input_ids.shape[1]
    generated_tokens = outputs.sequences[:, input_length:]
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, outputs)
    ]

    return {
        'label': tokenizer.decode(generated_tokens[0][0]),
        'probability': transition_scores[0][0].item(),
    }

import random
for c in random.sample([c for c in cases if 'helmet' in c['incident'].lower()], 32):
    c.update(**probability(c['incident']))

sorted_data = sorted([c for c in cases if c.get('probability', 0)<0], key=lambda item: item["probability"])

So I later came up with an alternative approach:

import requests
import textwrap

def probability(incident: str) -> dict:
    """Relies on a running llama-server eg. llama-server --hf-repo Qwen/Qwen2.5-1.5B-Instruct-GGUF --hf-file qwen2.5-1.5b-instruct-q4_0.gguf"""

    json_data = {
        'messages': [
            {"type": "system", "content": textwrap.dedent(f"""\
                Classify if given incident notes state if a helmet was used by the patient
                True if the notes indicate that helmet use by the patient
                False if the notes indicate that no helmet use by the patient
                Unknown if helmet used cannot be determined
                ONLY Output True , False , Unknown
                """)},
            {"type": "user", "content": incident.lower()},
        ],
        'stream': False,
        'n_probs': 1,
    }

    response = requests.post('http://localhost:8080/v1/chat/completions', json=json_data)
    r = response.json().get('completion_probabilities')[0]['probs'][0]
    # print(response.json())
    return {
        'label': r['tok_str'],
        'probability': r['prob'],
    }
import random
for c in random.sample([c for c in cases if 'helmet' in c['incident'].lower()], 16):
    c.update(**probability(c['incident']))
    print(highlight_substring(str(c), 'HELMET', "31"))

Nat Taylor — Blog, AI, Product Management & Tinkering

Test Drive: Local Logprobs

Post Navigation