Feb 21, 2025 16:00:00

A new backdoor attack called 'DarkMind' is proposed that leverages LLM's inference capabilities

Zhen Guo and Reza Tourani of Saint Louis University have proposed and demonstrated a new backdoor attack called ' DarkMind ,' which utilizes the inference capabilities of LLMs (large-scale language models).

[2501.18617] DarkMind: Latent Chain-of-Thought Backdoor in Customized LLMs

https://arxiv.org/abs/2501.18617

DarkMind: A new backdoor attack that leverages the reasoning capabilities of LLMs
https://techxplore.com/news/2025-02-darkmind-backdoor-leverages-capabilities-llms.html

DarkMind is an exploit that targets a vulnerability in a reasoning paradigm called Chain-of-Thought (CoT), which is used by LLMs such as ChatGPT to sequence complex tasks.

DarkMind embeds 'hidden triggers' in the reasoning process, for example by setting a '+' sign as a trigger during the calculation of a mathematical formula, intentionally manipulating the result to be an incorrect value.

In experiments conducted by the research team, DarkMind showed higher attack efficiency compared to existing backdoor attack methods. Against the latest LLMs such as GPT-4o and O1, DarkMind demonstrated high attack success rates of over 90% for arithmetic reasoning, about 70% for common sense reasoning, and over 95% for symbolic reasoning. This could be a serious threat, especially for customized model platforms such as GPT Store, and the researchers point out the challenge of developing an effective defense mechanism.

Another practical danger of DarkMind is that an attacker does not need to show the model specific errors in advance. Conventional backdoor attacks require multiple examples to be presented, but DarkMind can be effective without this.

LLM is being integrated into critical websites and applications, such as banking and healthcare services, and attacks like DarkMind's could potentially manipulate the decision-making processes of these systems without detection, posing a serious security threat.

The research team said that they plan to focus on developing new defense mechanisms, such as checking the consistency of inferences and detecting adversarial triggers, and will continue to conduct broad research on the security of LLMs, such as attacks on multi-turn dialogues and embedding hidden instructions.

Related Posts:

Feb 21, 2025 16:00:00 in Software, Security, Posted by log1i_yk