Yesterday, I wrote that LLM prompt hacking was like an injection attack. I looked up injection in OWASP’s 2021 10 top of security vulnerabilties and see that it’s number three. Since LLM prominence started this year, they haven’t listed prompt hacking yet, but you can see from their description and list of remedies how similar it is to injection. And since we’re busily attaching LLMs to web applications via their APIs, prompt hacking should be considered a web application security vulnerability in the next survey.
Here’s the top prevention technique:
Preventing injection requires keeping data separate from commands and queries:
- The preferred option is to use a safe API, which avoids using the interpreter entirely, provides a parameterized interface, or migrates to Object Relational Mapping Tools (ORMs).
Note: Even when parameterized, stored procedures can still introduce SQL injection if PL/SQL or T-SQL concatenates queries and data or executes hostile data with EXECUTE IMMEDIATE or exec().
For an LLM this means that the LLM itself isn’t affected by the user query. I realize that that may be impossible with current implementations. My suggestion is to somehow create two channels (one for “code” and one for “data”) in the training process so that the resulting model isn’t exploitable this way.
No, I have no idea how to do that, but it’s not with a more convoluted prompt.