Distinguishing between LLM injection detection methods
In How might LLMs detect injected tokens? I described two methods that LLMs could use to detect injected tokens in their output: In How might LLMs detect injected tokens? I described two methods that LLMs could use to detect injected tokens in their output: