BitmapCake!

Prompt-Level Distillation for Coding Agents: What I Learned from 146 Claude Code Sessions

Posted by Nimeshka Srimal Monday, July 6, 2026 0 comments

Fable was out and I was using it within the limits to see what it's capable of doing, and I thought to myself, why not try to see if I can use fable to teach my day to day models some lessons to improve them, so I still retain some of its judgements and discipline before it's locked behind a paywall.

My existing Claude code sessions were the best source for it. I asked fable to analyze it for me and this is what it came up with:

146 sessions, 1,586 human turns, 687 flagged as corrections/frustration

That's actually enough data to identify where the models I used often (Opus 4.8 and Sonnet) did less than I expected. Look at the 687 corrections!

here's an actual breakdown of it :

Rank	Failure mode	Hits
1	Unwanted changes / scope creep ("don't touch that", "revert", "I only asked for X")	354
2	Act-before-plan ("wait", "explain first", "why did you")	275
3	Didn't follow instructions ("I already said", "follow the pattern")	149
4	Repetition / looping (same mistake N times)	104
5	Didn't check first	79
6	Assumptions / hallucination (referenced things that don't exist)	69
7	Claimed done but broken	65

So I asked fable how differently it would have worked if it was given these same tasks, and I captured its judgments to a reusable set of Skills and hooks.

One technical limitation to clarify though: Model distillation is an actual technique where you can use a Large, well performing LLM to act as a teacher for smaller student models. I was inspired to try this out after coming across the Gemma4 12B parameter model which was distilled and trained using Fable and Composer 2.5 by someone recently. [https://huggingface.co/yuxinlu1/gemma-4-12B-agentic-fable5-composer2.5-v2-3.5x-tau2-GGUF]

I've tried both Gemma4 native version and the distilled version locally and noticed clear differences and thought I should really try something similar..

But again, the limitation: This is not traditional distillation in the ML sense. We cannot transfer the weights or the latent reasoning of Fable into Sonnet or Opus (Like they did to Gemma4), but this is a different distillation technique called Prompt-Level Distillation (PLD) [https://arxiv.org/abs/2602.21103].

After the analysis, I identified two kinds of failures:

Judgment failures - The fixes for these issues depend largely on how Claude works. A skill affects behavior only when it is invoked. But whether a skill invoked depends on the skills description matching the prompt. So we can't put the fixes for these issues as a skill. So this must be an always-on instructions (preferably in CLAUDE.md file)
Mechanical failures - These are actually costly mistakes. For example, in several instances it pushed to the repository without my permission. It edited files without reading the file first etc. These kinds of failures do not depend on the prompt and these can be deterministically checked, and we can use hooks to control this.

There are a few real lessons learned.

I tried prompting similar tasks to Sonnet, Opus and Fable. They all worked equally well on some tasks. This does not mean all three are the same, but it proves Sonnet and Opus are already doing good. My realization was that mostly, the gap between a stronger model and a cheaper model on most tasks is not in the model's IQ, but in the discipline.

Final outcome: Honestly, you can't teach the cheaper model to have the Large model's brain. That's not going to work. The final outcome is a distillation of my own standards, reverse-engineered from where i kept asking the same things from the model, with the help of a larger model like Fable.

I have uploaded the final artefacts to my github and wrote this same thing in medium. Feel free to check it out.

https://github.com/Nimeshka/claude-code-discipline

Hands-free voice mode for Claude Code

Posted by Nimeshka Srimal Wednesday, June 24, 2026 0 comments

Speak to it and it talks back. Continuous listening (no push-to-talk), conversational spoken replies, fully local & offline on Apple Silicon.

AI has made it very easy to turn a random idea into a working prototype in a few hours. I was trying out Nvidia’s Nemotron 3.5 ASR local model and was in the process of making a real-time Zoom transcription service to see what it can do.

I was using Claude Code, and something clicked and I thought why not use it for Claude so I can use it in a conversational mode. Claude Code already has a voice dictation mode, but it’s push-to-talk (hold a key, it transcribes into the prompt, audio goes to the cloud), and it doesn’t speak back.

Nemotron 3.5 ASR (nemotron-asr-streaming) is a lightweight local ASR (Automatic Speech Recognition) model with just 600M parameters and good at realtime transcribing. So it was a great fit for this use case.

My idea was simple.

Create a turn-based flat file input / output system using a Python script that listens on the mic, uses Nemotron for ASR, and then writes it to a JSONL file (this is because JSONL is very effective for streaming and logging line by line).

Then use a Claude Code skill to manage the loop:

* make sure the background scripts are running
* pick up the latest transcription
* send it into Claude Code
* write two outputs

One output goes back to the terminal / Claude interface, so I keep the full Claude session transcript (this matters because I can search for past conversations using sessions later). The other output is a short speech response written to file (not the whole transcript. Just a concise summary of what’s done in a conversational tone).

This is what I ended up with:

Stack, all on Apple Silicon, all local:

* ASR: Nemotron streaming ASR (0.6B, CoreML/INT8) served by a warm local HTTP server behind an OpenAI-compatible /v1/audio/transcriptions endpoint. Keeping the model loaded in memory eliminates per-utterance load latency.
* TTS: on-device macOS system voice (a downloadable “Enhanced” voice), ~0.1 s time-to-first-audio.
* Orchestration: a Claude Code skill that drives the loop.

I also tried adding a CoreML neural TTS through CLI, but it had a cost of ~25 seconds per call, which made it unusable as a natural conversational loop. So warm TTS over a WebSocket Realtime API is planned as a future upgrade.

Final Outcome = a low-latency voice loop with a fully local and private speech layer. A 0.6B local ASR plus on-device TTS handle the I/O, while the coding agent that does the actual work can be local or cloud (in my case, Claude Code).

Code + architecture write-up: https://github.com/Nimeshka/handsfree-claude

Feel free to clone the repo and check it out. The code is public.

I wrote about this in my Linkedin too: https://www.linkedin.com/feed/update/urn:li:activity:7475476609827786752/

Free Headless AI Automation with Claude Code CLI (No API Keys)

Posted by Nimeshka Srimal Wednesday, June 10, 2026 0 comments

I'm writing after a long break and it feels good that I've been constantly doing this. I've tried my best to keep this alive by at least writing one post whenever I get a time but somehow I missed it and here I'm back on this after a year or so because I came across something interesting and worth sharing to help my future self and others!

So I was experimenting with Anthropic's Claude Code CLI to automate some heavy background workflows on my headless server. Typically, if you want to run an AI agent via the terminal or inside a script, you are hit with a frustrating wall: you have to go to the developer console, add in a credit card, Add funds to your API balance, and manage limits. That's too much work!

But if you already pay for a Claude.ai Team or Max web subscription, there is a brilliant way to bypass pay-as-you-go API costs entirely. You can use your existing web subscription seat directly inside a headless terminal environment.

The Trick to Set It Up

Although I call it a trick, please note that this is not a hack or a workaround. It is completely within Anthropic’s official usage terms. In fact, they built this native command so subscription users (Pro, Max, or Team) don't have to pay extra API consumption fees just to use Claude in their terminal. It is a great, limited way to bring Claude directly into your CLI workflows.

First, run the initialization command on your remote server:

claude setup-token

This will take you to an authorization URL. It might open in the browser, or if not, copy that link, paste it into a browser where you are logged into your Claude account, approve it, and it will generate a long-lived OAuth token valid for a year.

Export it to your profile so your shell always remembers it:

echo 'export CLAUDE_CODE_OAUTH_TOKEN="your_token_here"' >> ~/.zshrc 
source ~/.zshrc

Now, the real magic happens when you want to run it completely headless and non-interactive (without the terminal waiting for you to press keys). You can combine the print flag with an automated permission bypass flag like this:

claude -p "Find and fix formatting errors in config files" --permission-mode bypassPermissions --max-turns 3

One thing to note: Because Claude Code is an active agent loop (it reads files, runs commands, and thinks sequentially), it can utilize unexpected amount of tokens. Running heavy automated scripts will tap directly into your personal subscription limit pool. But this is just for your account - you are not blocking anyone else (This was a concern I had myself, and had to check that first). Using `--max-turns 3` is a great safety guardrail to make sure that a background loop doesn't accidentally use your entire daily quota in one go!

Hope it helps! Let me know in the comments if you know a better way.

Code embed component in React / Next?

Posted by Nimeshka Srimal Tuesday, August 6, 2024 0 comments

From React to Next.js — A Developer's Insight | by Dimitar Atanasov | JavaScript in Plain English

So one of my colleagues came to me with this requirement. He wanted to create a code embed component in the React (NextJS) app that he was working on. Initially, he had used the dangerouslySetInnerHTML attribute to set the inner HTML to the embed code (unsanitized!!). This surely works, but due to the way NextJS navigation works, this would render an empty element if you navigate back and forth. He was actually expecting a fix for this. But as a seasoned developer, you should not be applying fixes over what’s already wrong. You got to fix it the proper way.

My first concern was that using dangerouslySetInnerHTML with arbitrary HTML could lead to vulnerabilities such as XSS. (Of course, this could have been mitigated by sanitizing the HTML). The other issue was the direct arbitrary DOM manipulation - which is a bad practice.

Above all, this approach didn’t work with NextJS navigation, so trying to make it work was the challenge. So I came up with the following. I wanted to post it on my blog hoping this would be useful to someone in the future.

import { useEffect, useRef } from "react";

const CodeEmbedExample = ({ embed_code }) => {
  const embedRef = useRef();

  useEffect(() => {
    if (!embed_code) return;

    try {
      const range = document.createRange();
      const documentFragment = range.createContextualFragment(embed_code);
      embedRef.current.append(documentFragment);
    } catch (error) {
      console.error("Error embedding code:", error);
    }

    return () => {
      if (embedRef.current) {
        embedRef.current.innerHTML = "";
      }
    };
  }, [embed_code]);

  return <div ref={embedRef}></div>;
};

export default CodeEmbedExample;

Please note the above is a cleaned-up example I created for this blog post, which is untested. This is just to communicate the idea and not intended as a copy-paste solution. I’ve intentionally omitted the HTML sanitization part here to keep it simple.

You could improve the code by adding sanitization through a package like DOMPurify, and useMemo for efficiency. Also, you can add PropTypes for type validations, etc. The key takeaway is the usage of ref and createRange functions.

Hope it helps! Feel free to leave a comment if you have any better ways to do this or share if you think it is useful.

Exit Shell script with error if any command fails

Posted by Nimeshka Srimal Friday, February 24, 2023 2 comments

I wrote a Bash script today to automate my code review process and I noticed that even when certain commands in the workflow fail, the script still executes the remaining commands.

For example, let's say I have two commands that run synchronously:

git merge origin/dev
git tag <new-tag>

In this case, if the `git merge` command fails due to some reason, it will still create a new tag, which is not my intended workflow.

To avoid this, if any of the commands returns a non-zero exit status, the script should break and exit. This can be achieved in Bash by using the `-e` option.

Add the following at the very top of your script (below the shebang line):

#!/bin/bash
set -e

This will enable the `-e` option for the entire script.

If you only want to enable this option for specific commands, you can prefix them with `set -e`, like this:

set -e
git push

This will cause the script to exit if the `git push` command fails.

Mac OS List directory in tree view using find command.

Posted by Nimeshka Srimal Wednesday, February 15, 2023 2 comments

In mac (also in a Linux systems) you will have to install an additonal package called `tree` to display the content of folder in a tree view.

But what if you are like me; someone who doesn't like to install additional packages just for a simple task? Well, you are at the right place then.

Here is a nice command to get that done for you. Just run the following.

find . -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'

That's just it. You should see a nice tree output as shown in the above screenshot :)

You could also convert this to a shell function, or an alias with a parameter to run this easily in the future.

Clear scroll back buffer in zsh

Posted by Nimeshka Srimal Thursday, November 10, 2022 2 comments

In bash you can use ctrl + L to clear the scroll back in the terminal. However in zsh, it doesn't work. So here is a workaround. You can use a control sequence to clear it, and set an alias for that.

Open your .zshrc file in the editor.

$ vim ~/.zshrc

And add this to the file.

alias cls='printf "\ec\e[3J"'

Save and reload the terminal: or

$ source ~/.zshrc

Now when you run the command 'cls', it should clear your terminal with its scroll back buffer. Hope it helps.