Counting Tokens Without Spending Any
I kept guessing at how much context my agent prompts were eating. Not a great way to run things. The obvious fix is a token counter. The less obvious part is that you don’t need an API to build one. If you’re already running models locally, the tokenizer is right there. So I wrote a tiny CLI. Point it at a file, name a model, get a count. That’s the whole thing. ...
The API in Front of the AI: Part 2
Filed under: Cloud Engineering · AI Infrastructure · Local Lab Where Part 1 left off In Part 1 we got Bifrost running locally, wired up Ollama with qwen3.5, and confirmed the stack end to end. Requests through the gateway, streaming, tool calling. This post adds MCP, the Model Context Protocol. Part 1 gave the model a reliable connection. This part gives it tools. By the end you’ll have a local MCP server exposing real capabilities (system info, allowlisted shell commands, math) connected through Bifrost so qwen3.5 can run them. Still no cloud. ...
The API in Front of the AI
Filed under: Cloud Engineering · AI Infrastructure · Local Lab The problem You wire an Ollama model into an app and it works. A couple months later you’ve got five apps, a few models, and no idea what’s calling what, how often, or what it’s doing to your GPU. That’s the gap an LLM gateway fills. This is Part 1 of a two-part series. Here we cover what a gateway does, why it’s worth running, and how to get Bifrost talking to Ollama qwen3.5 on your Mac. Fully local. ...