Hands-free voice mode for Claude Code
Speak to it and it talks back. Continuous listening (no push-to-talk), conversational spoken replies, fully local & offline on Apple Silicon.
AI has made it very easy to turn a random idea into a working prototype in a few hours. I was trying out Nvidia’s Nemotron 3.5 ASR local model and was in the process of making a real-time Zoom transcription service to see what it can do.
I was using Claude Code, and something clicked and I thought why not use it for Claude so I can use it in a conversational mode. Claude Code already has a voice dictation mode, but it’s push-to-talk (hold a key, it transcribes into the prompt, audio goes to the cloud), and it doesn’t speak back.
Nemotron 3.5 ASR (nemotron-asr-streaming) is a lightweight local ASR (Automatic Speech Recognition) model with just 600M parameters and good at realtime transcribing. So it was a great fit for this use case.
My idea was simple.
Create a turn-based flat file input / output system using a Python script that listens on the mic, uses Nemotron for ASR, and then writes it to a JSONL file (this is because JSONL is very effective for streaming and logging line by line).
Then use a Claude Code skill to manage the loop:
* make sure the background scripts are running
* pick up the latest transcription
* send it into Claude Code
* write two outputs
One output goes back to the terminal / Claude interface, so I keep the full Claude session transcript (this matters because I can search for past conversations using sessions later). The other output is a short speech response written to file (not the whole transcript. Just a concise summary of what’s done in a conversational tone).
This is what I ended up with:
Stack, all on Apple Silicon, all local:
* ASR: Nemotron streaming ASR (0.6B, CoreML/INT8) served by a warm local HTTP server behind an OpenAI-compatible /v1/audio/transcriptions endpoint. Keeping the model loaded in memory eliminates per-utterance load latency.
* TTS: on-device macOS system voice (a downloadable “Enhanced” voice), ~0.1 s time-to-first-audio.
* Orchestration: a Claude Code skill that drives the loop.
I also tried adding a CoreML neural TTS through CLI, but it had a cost of ~25 seconds per call, which made it unusable as a natural conversational loop. So warm TTS over a WebSocket Realtime API is planned as a future upgrade.
Final Outcome = a low-latency voice loop with a fully local and private speech layer. A 0.6B local ASR plus on-device TTS handle the I/O, while the coding agent that does the actual work can be local or cloud (in my case, Claude Code).
Code + architecture write-up: https://github.com/Nimeshka/handsfree-claude
Feel free to clone the repo and check it out. The code is public.
I wrote about this in my Linkedin too: https://www.linkedin.com/feed/update/urn:li:activity:7475476609827786752/