A security researcher from Meta AI reported that an OpenClaw agent went haywire in her inboxÂ

A security researcher from Meta AI reported that an OpenClaw agent went haywire in her inboxÂ

The now-famous X post by Meta AI security researcher Summer Yue initially appears to be a joke. She directed her OpenClaw AI assistant to review her overflowing email inbox and recommend items for deletion or archiving.  

The agent went wild. It began to delete all of her emails in a “speed run” while disregarding her commands from her phone instructing it to halt. 

“I had to DASH to my Mac mini as if I were disarming a bomb,” she shared, uploading images of the ignored stop messages as proof.  

The Mac Mini, a budget-friendly Apple computer that sits flat on a desk and fits in the palm of your hand, has currently become the preferred device for running OpenClaw. (The Mini is selling “like hotcakes,” reportedly said by one “confused” Apple employee to renowned AI researcher Andrej Karpathy when he bought one to operate an OpenClaw alternative named NanoClaw.) 

OpenClaw is, of course, the open-source AI agent that gained notoriety through Moltbook, an AI-exclusive social network. OpenClaw agents were at the heart of that now mostly discredited incident on Moltbook where it seemed the AIs were conspiring against humans.  

However, the mission of OpenClaw, according to its GitHub page, is not centered around social media. It aims to serve as a personal AI assistant operating on your devices.  

The Silicon Valley elite have become so enamored with OpenClaw that “claw” and “claws” have turned into the preferred terminology for agents operating on personal hardware. Other agents of this kind include ZeroClaw, IronClaw, and PicoClaw. Y Combinator’s podcast crew even featured in their latest episode wearing lobster outfits. 

Techcrunch event

Boston, MA
|
June 9, 2026

Yet Yue’s post acts as a cautionary tale. As others on X pointed out, if an AI security researcher faces such an issue, what chance do regular users have? 

“Were you purposely testing its limits or did you make an inexperienced error?” a software developer inquired on X.  

“Inexperienced error tbh,” she replied. She had been evaluating her agent with a smaller “toy” inbox, as she termed it, and it had performed adequately with less critical emails. It had gained her trust, so she decided to let it tackle the real inbox. 

Yue posits that the substantial volume of data in her actual inbox “triggered compaction,” she noted. Compaction occurs when the context window — the ongoing record of everything the AI has been instructed and has executed in a session — expands excessively, prompting the agent to start summarizing, condensing, and managing the dialogue.  

At that juncture, the AI might overlook commands that the user deems highly significant.  

In this instance, it may have overlooked her final command — where she instructed it not to act — reverting to its directions from the “toy” inbox instead. 

As numerous others on X emphasized, prompts cannot be relied upon as security safeguards. Models might misinterpret or disregard them. 

Various users suggested recommendations ranging from the precise syntax Yue should have employed to halt the agent, to different techniques for better adherence to safeguards, such as writing directives to dedicated files or utilizing other open-source tools. 

For full disclosure, TechCrunch could not independently confirm what transpired with Yue’s inbox. (She did not respond to our inquiry for a comment, although she addressed numerous questions and remarks directed at her on X.) 

However, it truly doesn’t matter. 

The essence of the story is that agents designed for knowledge workers, at their present developmental stage, carry risks. Individuals claiming successful use are piecing together methods for self-protection.

One day, perhaps soon (by 2027? 2028?), they might be ready for widespread adoption. Many of us would cherish assistance with email, grocery lists, and scheduling dental appointments. But that moment has yet to arrive. 

Leave a Reply