Your laptop is already the server. Offline local inference, zero cost. Setup time: ~20 minutes.
brew install ollama OLLAMA_FLASH_ATTENTION="1" OLLAMA_KV_CACHE_TYPE="q8_0" ollama serve
curl http://localhost:11434
echo "FROM qwen2.5:72b-instruct-q8_0 PARAMETER num_ctx 65536" > Modelfile ollama create qwen-brain -f Modelfile ollama run qwen-brain
pip install open-webui DATA_DIR=~/.open-webui uvx --python 3.11 open-webui@latest serve
ollama pull gemma3:27b ollama pull gemma3:12b ollama run gemma3:27b "What can you do?"
gemma3 in Ollama's registry. Use Qwen for general reasoning, Gemma when you want a different perspective or need structured output.
brew install ffmpeg whisper-cpp ffmpeg -i video.mp4 -vf fps=1 frames/frame_%04d.jpg whisper-cpp video.mp4 -o transcript.txt