Web programmers can cause security problems if they embed data into HTML and render the result. For example, if I have a simple form that asks for your name and then output a page with that name in it, I’ll open myself up to an “injection” attack if the user types in some Javascript, and I don’t carefully escape it. I’ll end up running that Javascript.
The same is true if we take user data and try to create queries by concatenating it with SQL, as lampooned by XKCD.
We invented encoding and string interpolation techniques to solve this. But nothing forces you to use those features, so we still mess it up, which is why security bounties are frequently paid for injection attacks.
But, those issues are with legacy languages like HTML and SQL where we send strings that mix code and data over the network and run them. We should have designed them in a way that separated the code and the data. Surely, we learned from that for new things that we invented since then.
We did not.
An LLM chatbot is also a service that we send strings over a network to. The prompt you send is “code” in natural language and the LLM “runs it”. The problem is that there is a kind of meta-language that controls the chatbot itself, which can be sent before your normal prompts. Using these “jailbreaking” prompts, you can trick the LLM into dropping its safety net and produce hate speech or help you code malware.
These prompts are essentially the same idea that Bobby’s mom is using in the comic, and the solution is likely going to be a prompt version of what encoding and string interpolation is doing.
It would be better if the system was designed such that user chat requests weren’t treated like a program that could change the chatbot itself.