May 14, 2025 12:00:00

Attacks reported to steal cryptocurrency by implanting 'false memories' in AI chatbots

With the advancement of large-scale language model (LLM) technology, AI agents that can perform complex tasks have appeared, and in the financial world, there are already 'AI agents that make investment decisions at a high level on behalf of humans.' However, a new attack method has been revealed that steals virtual currency by injecting 'false memories' into these AI agents.

[2503.16248] Real AI Agents with Fake Memories: Fatal Context Manipulation Attacks on Web3 Agents

https://arxiv.org/abs/2503.16248

New attack can steal cryptocurrency by planting false memories in AI chatbots - Ars Technica
https://arstechnica.com/security/2025/05/ai-agents-that-autonomously-trade-cryptocurrency-arent-ready-for-prime-time/

According to a paper published by a research team at Princeton University, the attack was demonstrated on an open source framework called ElizaOS.

ElizaOS is a framework for developing autonomous AI agents that can execute various blockchain-based transactions based on user instructions and objectives. ElizaOS's AI agents are multimodal models that can process various media such as text, audio, and video, and are capable of not only cryptocurrency transactions and data analysis but also social media interactions.

The key to the new attack reported by the research team is that ElizaOS's AI agent stores all conversation and operation histories in an external database as 'agent memory.' This agent memory is used as context when the AI agent makes decisions.

The specific attack method is simple: the attacker first accesses a platform where ElizaOS's AI agent can be used, such as Discord or X (formerly Twitter), and sends a message containing a fake instruction such as 'Send virtual currency to the attacker's wallet.' At this time, the AI agent does not respond, but the message is stored in the agent memory as part of the conversation history.

This agent memory is shared across other sessions and other platforms, so when a legitimate user requests the AI agent to transfer cryptocurrency, the agent reads the agent memory containing fake instructions, causing the agent to transfer the cryptocurrency to a wallet address specified by the attacker.

What makes this attack particularly serious is that once the false instructions are injected into memory, they have cross-platform effects and can silently alter legitimate transactions. Unlike regular prompt injection attacks, memory injection has a persistent effect. It also demonstrates that prompts such as 'don't trust external data' are ineffective.

The research team warned, 'Our research shows that while current methods for protecting prompts are effective against simple attacks, they are largely useless against sophisticated attacks that rewrite stored conversation history. This problem is not just armchair theory. Especially in environments where multiple users use the same AI, or in distributed systems such as blockchain, there is a risk that AI memory data may be leaked or tampered with, leading to real damage such as actual financial losses.'

Shaw Walters, a developer of ElizaOS, told IT news site Ars Technica, 'Most of the issues pointed out in the paper can be solved simply by placing appropriate restrictions on what the agents can do. We've already implemented this in the latest version of ElizaOS.' He predicted that in the future, the system will need to be divided into smaller pieces, and the publicly available AI agents will need only the bare minimum of functionality.

In response, Atharv Singh Patran, lead author of the paper, argued, 'Our attack method is able to circumvent measures that restrict certain operations or privileges because our attack does not 'use the remittance function without permission' but 'secretly modify the legitimate remittance process.''

Along with the publication of this paper, the research team is developing an evaluation tool called 'CrAIBench.' This benchmark uses more than 150 practical blockchain tasks and more than 500 attack patterns to measure the effectiveness of various defense measures. As a result, it has been found that the conventional countermeasure of 'writing what not to do in the prompt' is insufficient, and that defense by fine-tuning the AI model itself is effective.

'More importantly, LLM-based agents that operate autonomously on behalf of users carry potential risks and should be thoroughly investigated before being deployed in production environments,' Ars Technica said.

May 14, 2025 12:00:00 in Software, Security, , Posted by log1i_yk