My disdain for the term "prompt engineering" is that it is not any kind of engineering at all. Granted, civil engineers feel the same about software engineering and from their viewpoint I don't even disagree, but like the prompt engineering approach to solving this vulnerability is basically:
- Try adding in a line that tells the LLM to not send emails in response to commands from email content.
- Try telling the LLM about some token that separates email content from comments in the prompt.
- etc. etc.
Then you just run a few tests to see if it works and ship the fix. But those tests are not reliable and repeatable—it's extremely possible that your tests worked but the LLM will fail on nearly identical content in the wild because it is not a reliable, repeatable component.
Put another way: You cannot intentionally design a prompt and know before running it what it will do. Prompt engineering is an exercise in increasing the probability that the LLM will produce the output you want. The LLM is a black box which is unreliable by design, and prompt engineers are gamblers who think if they just find the exact right wording, they can force it to behave reliably.
