Nat TaylorBlog, AI, Product Management & Tinkering

Test Drive: Local Model Servers

Published on .

Today I’m test driving local model servers. My task is to write a limerick. There isn’t all that much to do except invoke the server processes correctly. On my Macbook Pro M1, all I had to do was run any of the following.

llama-server --hf-repo bartowski/Llama-3.2-1B-Instruct-GGUF --hf-file Llama-3.2-1B-Instruct-Q4_K_M.gguf
./Llama-3.2-3B-Instruct.Q6_K.llamafile
 mlx_lm.server --model mlx-community/Llama-3.2-3B-Instruct-4bit

Viola! They all offer OpenAI compatible API servers, so I can run the following code:

from openai import OpenAI
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key = "sk-no-key-required"
)
completion = client.chat.completions.create(
    model="mlx-community/Llama-3.2-3B-Instruct-4bit",
    messages=[
        {"role": "user", "content": "Write a limerick about python exceptions"}
    ]
)
print(completion.choices[0].message)

Post Navigation

«
»