Apple Silicon
13 articles about apple silicon.
download-ggml-model.sh large-v3: How to Download the Full Whisper Large Model
Step-by-step guide to using download-ggml-model.sh large-v3 for whisper.cpp. Covers setup, model size, performance benchmarks on Apple Silicon, large-v3 vs large-v3-turbo, quantization, and troubleshooting.
ggml-large-v3.bin: Complete Guide to Whisper's Largest GGML Model
Everything about ggml-large-v3.bin for whisper.cpp, including download, setup, performance benchmarks, quantization options, and when to choose it over the turbo variant.
ggml-large-v3-turbo.bin: The Fast Whisper Model for Real-Time Transcription
Complete guide to ggml-large-v3-turbo.bin for whisper.cpp. Covers download, setup, quantization, benchmarks, and how the turbo model achieves 6x faster inference with minimal accuracy loss.
whisper.cpp Metal on Apple Silicon: GPU Acceleration for Local Speech-to-Text
How to build and optimize whisper.cpp with Metal GPU acceleration on Apple Silicon Macs. Covers build flags, performance tuning, model selection, and real benchmarks.
download-ggml-model.sh large-v3-turbo: Complete Guide to Downloading Whisper Models
How to use download-ggml-model.sh to get the large-v3-turbo model for whisper.cpp. Covers the script internals, model variants, troubleshooting, and performance on Apple Silicon.
M4 Pro with 48GB Memory for Local Coding Models?
48GB of unified memory on an M4 Pro fits 70B parameter models at Q4 quantization. Local inference for privacy-sensitive work and overnight batch processing.
First Speculative Decoding Across GPU and Neural Engine on Apple Silicon
Running two models on the same Apple Silicon chip - a 1B draft model on the Neural Engine and a larger model on GPU for faster local inference.
Combining Apple On-Device AI Models with Native macOS APIs - The Real Power Move
On-device models are useful for local inference, but the real power move is combining them with macOS native APIs like accessibility, AppleScript, and
Dedicated AI Hardware vs Your Existing Mac - Why a Separate Device Is Premature
Your Mac already has everything needed to run a full AI agent locally. Dedicated AI hardware adds cost and complexity without solving real problems.
385ms Tool Selection Running Fully Local - No Pixel Parsing Needed
Local agents using macOS accessibility APIs skip the screenshot-parse-click cycle. Structured app data means instant element targeting and sub-second tool
Local Voice Synthesis for Desktop Agents - Why Latency Matters More Than Quality
System TTS is robotic. Cloud TTS has 2+ second latency. For conversational AI agents on Mac, local synthesis on Apple Silicon hits the sweet spot - under 2
Mac Studio M2 Ultra for Agentic Coding - 192GB RAM Running Everything
A Mac Studio M2 Ultra with 192GB RAM runs Xcode, iOS simulators, Rust builds, and multiple AI agents simultaneously. Here is why high-end Apple Silicon
Using Ollama for Local Vision Monitoring on Apple Silicon
Local vision models through Ollama handle real-time monitoring tasks like watching your parked car. Apple Silicon M-series makes local inference fast enough
Browse by Topic
How did this page land for you?
React to reveal totals
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.