Test Drive: Local Model Servers
Published on .
Today I’m test driving local model servers. My task is to write a limerick. There isn’t all that much to do except invoke the server processes correctly. On my Macbook Pro M1, all I had to do was run any of the following.
llama-server --hf-repo bartowski/Llama-3.2-1B-Instruct-GGUF --hf-file Llama-3.2-1B-Instruct-Q4_K_M.gguf
./Llama-3.2-3B-Instruct.Q6_K.llamafile
mlx_lm.server --model mlx-community/Llama-3.2-3B-Instruct-4bit
Viola! They all offer OpenAI compatible API servers, so I can run the following code:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key = "sk-no-key-required"
)
completion = client.chat.completions.create(
model="mlx-community/Llama-3.2-3B-Instruct-4bit",
messages=[
{"role": "user", "content": "Write a limerick about python exceptions"}
]
)
print(completion.choices[0].message)