Siri AI is a Malware Vector

I hope you are able to use the latest Apple OSs with Siri AI completely turned off. I believe that, as described, it will be a fertile ground for malware reminiscent of Windows 20 years ago.

I would love if anyone has information about the details of Siri AI that refute this.

1. Stopping prompt injections is impossible right now.

To back this up, read Anthropic’s system card for Opus 4.8. Page 77 shows the various top model’s probability of stopping prompt injections. Opus 4.8 is just under 10% with 100 attempts. Gemini (which Siri is based on) is 45% with 100 attempts.

This may be an inevitable and unsolvable problem. So …

2. We must assume that any Agent that has been exposed to text that we don’t trust is under the control of an adversary.

This is a design constraint right now. The rest of the system must be architected around this assumption.

I would never run an agent on my personal machine, because …

3. There is a lot of untrusted text on my personal devices.

Here is a partial list: All incoming emails and texts, all documents I didn’t write, all e-books, sites I browse. Siri AI can “look” at apps I am running. If you code on your machine, then all dependencies (every README, skill, etc).

This means that any file type with text is potential malware, not just executables or scripts.

But that’s not the only thing on your machine …

4. There are also a lot of “secrets” on your machine

The partial list above also includes your trusted text with your secrets. Things like: your passwords (if you let Siri AI reset them as shown in the keynote), your emails and texts, photos, financial information, personal documents, and bitcoin wallets.

So, you have a high potential to let an AI Agent that is under an adversary’s control see a secret. This is not ok, because …

5. The Agent is able to “do things”

For example: form a URL and make a network request with it, control applications, show an image from the internet (which is a special case of requesting a URL).

Siri AI will likely ask for approval, but …

6. Approval-based permission models don’t work

There is no way to make an informed decision about what is safe for an AI Agent to do. Even so, you won’t likely be asked to approve URL requests. Also, approval fatigue is real.

Apple didn’t show any permission prompts, but I assume that there will be some because they are not using …

7. The “better” (not perfect) solution is sandboxes, firewalls, OS-level auth

My opinion is the best way to run agents is in sandboxes with their own accounts (not as you or super-user) with OS-level authorization and firewalls in place. And then … just let the agent go.

The agent will be exposed to prompt injections, but there are no secrets in the VM and I limit its actions the same way I would limit a logged in user on a shared machine. This is just normal system level user access control.

I wrote about this more in Escaping the Lethal Trifecta of AI Agents and Limiting the Chance of Code Agent Prompt Injections

Write While True Episode 57: Writing Tools for Publishing

Brian: And today, once again, we’re talking about tools, but this time focus on tools for publishing and collaboration in writing, which is all a bit more complicated than what text editor do you use? 

So, to start at the simpler end of the spectrum, though, there’s blogging and then there’s publishing things like books and pamphlets and whatnot. So let’s talk first about blogging tools, publishing to the internet.

Transcript

Write While True Episode 56: Writing Tools for Drafting

Brian: I’m Brian Hall. And today, I can’t wait, we’re talking about writing tools, talking about your writing stack.

And we’ll actually spread this out over two episodes.

Today we’re talking about the idea phase, outlining, collecting notes, and drafting. So just the writing part. We’ll save publishing, editing, collaboration tools for a future episode.

Transcript

Write While True Episode 55: Reader Archetypes

Lou: Today, I would actually like us to look back two episodes where we talked about reader profile, so writing towards a specific reader, and to maybe double-click on that a little and talk a little bit more about the archetype readers that you might be writing towards.

So a lot of the way I’ve come to think about this came from Amy Hoy. If you don’t know her, Amy Hoy, she’s one of the proprietors of the 30×500 e-course and methodology of product making.

Her idea is that if you have a $30 per month product, and you find 500 customers, that that would be enough to build a small business. And the way that she wants you to get towards that product is by creating educational content for the audience that you intend to serve with the product. And then you’re going to, along the way, find your product after having done this educational content.

Transcript

Write While True Episode 54: Negative Feedback

Brian: Today we’re talking about negative feedback.

Lou: Oh boy.

Brian: Yeah, yeah. No, this one’s gonna hurt. When you write something and you share it, and then somebody tells you that it’s absolute trash in some manner or another, and there’s really no avoiding it.

Transcript

Write While True Episode 53: Pick a Reader

Brian: And I’m Brian Hall. And today we’re talking about reader profile, the person or people for whom you are writing, which is a really powerful concept to spend some time on, preferably before you start writing, at least if it’s a big project.

I think this came up in the last conversation and your example was, I do B2B SaaS. If you’re in gaming, things might be a little different, something like that. I just want to talk more about that because of how useful it is to make those distinctions. 

And I guess the basic point I want to make about reader profile, and we’ll get into what it is and how you do it, but it should make your writing easier to produce.

Transcript

Limiting the Chance of Code Agent Prompt Injections

Yesterday, I wrote about the Lethal Trifecta when using coding agents and how I am escaping it via sandboxing. I built a place to code where there is nothing valuable to lose. The agents might be poisoned by prompt injection and able to phone home, but there’s nothing to send. I can wipe the entire VM at any time and rebuild it from a snapshot or from scratch easily.

This deals with one leg of the trifecta, which is sufficient, but I don’t ignore the other two.

To limit the chance of an agent being exposed to a prompt injections, I build on an architecture of very limited dependencies. My current project is to build visualizations in JS on D3. I only include D3 on pages in the browser (it’s not on my machine). I don’t use npm, and I have no other dependencies.

The thing I miss most is jest, but I decided to build a minimal testing framework (just need to run functions and make assertions). I run the tests in a browser, so I get access to a DOM too, which I could test against. All of the code for this project only makes sense inside of a web page in the browser, which is another sandbox. It’s like Inception up in here.

My other projects are python based and live in their own VM. I need some dependencies there (pandas, numpy, matplotlib and more). The main thing I am doing is keeping that separate from the visualization project so that any issue in one doesn’t affect the other.

Nothing else that I need for the project (that I didn’t create) lives in that VM.

My main exposure to untrusted text is that I let the agent browse the web. I don’t see how I could avoid this, which is why this leg of the trifecta could never be the one I eliminate.

Escaping the Lethal Trifecta of AI Agents

The “Lethal Trifecta” is a term coined by Simon Willison that posits that you are open to an attacker stealing your data using your own AI agent if that agent has:

You need all three to be vulnerable, but usage of Claw or Coding agents will have them by default. I would say that the second two are almost impossible to stop.

#2 Untrusted content includes all of your incoming email and messages, all documents you didn’t write, all packages you have downloaded (via pip, npm, or whatever) and every web page you let the agent read. I have no idea how to make an agent useful without some of these (especially web searching).

#3 External communication includes any API call you let it make, embedded images in responses, or just letting it read the web. Even if you whitelist domains, agents have found ways to piggyback communication because many URLs/APIs have a way of embedding a follow-up URL inside of them.

For my uses, I find it impossible to avoid these two. Reduce? Yes, but not eliminate.

So, my only chance to escape the trifecta is to not give agents access to my private data. This means that I would never let an agent process my email or messages. I also would never run them on my personal laptop. I would never let them login as me to a service.

This is why I built hardware and software sandboxes to code in. Inside a VM on a dedicated machine, there is no private data at all. I use it while assuming that all code inside that VM is untrusted and that my agent is compromised. I do my best to try to make sure that won’t happen, but my main concern is that there is no harm if it does happen.

Incidentally, this same lethal trifecta also applies to every package you install into your coding projects. If an NPM package can (1) read your secrets (2) is untrusted and (3) can communicate, then you may suffer from a supply chain attack. It’s obvious that code you install and run makes #2 and #3 impossible to safeguard against. Not having secrets in the VM is the best solution for supply chain attacks too.

Tomorrow, I’ll follow up with how I reduce the other two legs of the lethal trifecta.

Write While True Episode 52: Using Feedback

Lou: Hey, Brian. I wanted to start this episode by doing a little bit of a follow up to episode 48, where we talked about starting a collaboration. One of the things we ended with was doing a simple collaboration by just getting feedback on something you’re writing. And I wanted to talk to you about your thoughts on what to do with feedback.

Brian: Yeah, let’s do it, for sure. And to begin, the assumptions here are that you’re working on a piece of writing that you intend to iterate on. You’re going to revise this writing, you’re going to improve it. It might be a book, newsletter, blog post, something you care enough about to spend time on after you first put it out into the world for feedback.

Transcript

Dev Stack, Part XI: Sandboxing

Late last year, I completely changed my dev stack to Python on Linux with some other things. I wrote a series about it at the time:

My choices were driven by the dangers of AI Coding Agents and Supply Chain attacks (more generally, just running untrusted code).

Getting all development off of my main machine was a big step. Choosing Linux for that machine was driven by cost per computing power for a desktop machine, and that I only need to run VSCode, a browser, and dev tools that are Linux first anyway.

I have been programming on the bare OS, but I was always going to want more isolation between projects and between the projects and the machine. I finally completed that step.

My choice was to use QEMU-KVM, an open-source VM solution. This blog about QEMU-KVM on Ubuntu was the most useful (and accurate) for me.

My general setup:

  1. The machine only has Ubuntu, Firefox, Tailscale (see networking), and my KVM setup described above.
  2. I built one VM to work on a new project (charting visualizations for Google Sheets), which only needs Ubuntu, VSCode, Git, and Firefox.
  3. This project is in Javascript, but I am building it with a dependency on D3 and nothing else. No NPM, not even jest. D3 is only loaded by the browser (not on the machine)
  4. For testing, I am building a minimal test harness in JS. It runs in the browser, so it will also be able to do DOM testing.
  5. There is no firewall yet, but I will probably do that soon. As a first step, just limiting the ports. I will document that if I go that way. It would be inside the VM.
  6. I allow some limited logged in browsing in my outside OS, mostly ChatGPT, but not Google. The main OS is for research. Nothing else can be installed on it (through any means, even trusted). The VM browsers are only for using my software (not the internet).

Other solutions I considered:

  1. Cloud based programming (like codespaces): This would definitely work for some projects I have, but I feel like I’d run up against limitations. Long-term, I think this will become the only sane way to program.
  2. Docker: I am not that comfortable with it, and it seems like running GUIs (like VSCode) is not trivial. It would be more efficient with sharing installed software, but wasting disk space is just not an issue.
  3. No Sandbox: Just putting all development on a dedicated computer is probably enough. I went the VM route mostly out of personal interest. Having done it, one big plus is snapshotting.