Yesterday, I wrote about the Lethal Trifecta when using coding agents and how I am escaping it via sandboxing. I built a place to code where there is nothing valuable to lose. The agents might be poisoned by prompt injection and able to phone home, but there’s nothing to send. I can wipe the entire VM at any time and rebuild it from a snapshot or from scratch easily.
This deals with one leg of the trifecta, which is sufficient, but I don’t ignore the other two.
To limit the chance of an agent being exposed to a prompt injections, I build on an architecture of very limited dependencies. My current project is to build visualizations in JS on D3. I only include D3 on pages in the browser (it’s not on my machine). I don’t use npm, and I have no other dependencies.
The thing I miss most is jest, but I decided to build a minimal testing framework (just need to run functions and make assertions). I run the tests in a browser, so I get access to a DOM too, which I could test against. All of the code for this project only makes sense inside of a web page in the browser, which is another sandbox. It’s like Inception up in here.
My other projects are python based and live in their own VM. I need some dependencies there (pandas, numpy, matplotlib and more). The main thing I am doing is keeping that separate from the visualization project so that any issue in one doesn’t affect the other.
Nothing else that I need for the project (that I didn’t create) lives in that VM.
My main exposure to untrusted text is that I let the agent browse the web. I don’t see how I could avoid this, which is why this leg of the trifecta could never be the one I eliminate.
