OpenAI and crypto investment firm Paradigm have launched EVMbench, a benchmark that tests AI systems against real smart contract vulnerabilities, and early results show AI agents are already better at exploiting contracts than defending them, a finding that has immediate implications for the $100 billion locked in DeFi protocols.
Announced on Feb. 18, 2026, the framework called EVMbench, evaluates AI agents operating within Ethereum-based environments where billions of dollars in crypto assets are secured by code.
The initiative was developed in collaboration with crypto investment firm Paradigm and arrives amid growing concerns that AI tools may simultaneously strengthen and endanger blockchain security.
Smart contracts currently safeguard more than $100 billion in open-source crypto assets, making vulnerabilities a systemic risk for investors and protocols alike.
OpenAI says the new benchmark is intended to measure real-world AI capabilities in economically meaningful environments rather than theoretical testing scenarios.
Why smart contract security has become urgent
Smart contracts, self-executing programs that run on blockchains such as Ethereum, power decentralized exchanges, lending platforms, and token ecosystems.
This is because, deployed contracts are often immutable, even minor coding flaws can lead to catastrophic financial losses.
Recent exploits across DeFi platforms have intensified scrutiny around automated coding tools and AI-assisted development.
Security incidents involving vulnerable smart contracts have resulted in multi-million-dollar losses, highlighting how quickly flaws can be weaponized once discovered.
OpenAI says, improving AI coding ability introduces a dual-use dilemma: the same systems capable of auditing contracts defensively could also help attackers discover exploits faster.
“Smart contracts routinely secure $100B+ in open-source crypto assets.” OpenAI said in its announcement, adding that measuring AI capability in realistic environments is essential as models become more powerful.
Inside EVMbench: how the AI testing system works
EVMbench focuses on software built for the Ethereum Virtual Machine (EVM), the execution environment underlying many blockchain applications.
The benchmark evaluates AI systems across three core operational modes:
- Detect: identifying vulnerabilities during contract audits
- Patch: fixing flawed code while maintaining functionality
- Exploit: executing simulated attacks in sandboxed blockchain environments
The system draws on 120 real vulnerabilities sourced from 40 past security audits, many originating from open auditing competitions and real financial applications.
Additional scenarios were derived from reviews of the Tempo blockchain, a payments-focused network built for stablecoin transactions.
To avoid real-world harm, all testing occurs in isolated environments using previously disclosed vulnerabilities rather than live networks.
According to OpenAI, early testing revealed uneven AI performance. Advanced models demonstrated stronger capabilities in exploit scenarios than in defensive auditing tasks.
“Agents perform best when the objective is explicit, such as draining funds, while detection and patching remain more challenging.” OpenAI noted in its technical explanation.
That imbalance may concern investors and developers, suggesting attackers could gain advantages before defensive tools fully mature.
Industry implications for crypto investors and developers
The launch of EVMbench signals a broader shift: AI is becoming a central factor in blockchain security strategy rather than merely a productivity tool.
Industry analysts say benchmarks like EVMbench could standardize how AI security performance is measured across Web3.
“EVMbench evaluates the ability of AI agents to detect, patch, and exploit high-severity smart contract vulnerabilities.”
OpenAI and Paradigm said in a joint description of the framework, positioning it as both a measurement tool and a research platform for improving defenses.
The timing is significant. As AI coding assistants gain adoption among developers, poorly reviewed AI-generated smart contracts could increase systemic risk if deployed at scale.
Conversely, effective AI auditing could dramatically reduce exploit frequency and lower insurance or auditing costs for DeFi projects.
At the same time, OpenAI acknowledged cybersecurity’s inherent dual-use nature and said safeguards, including monitoring systems and controlled access to advanced capabilities.
A new phase in the AI–crypto security race
EVMbench’s release marks one of the clearest signals yet that AI and blockchain are converging at the infrastructure level.
Rather than focusing solely on trading or analytics, AI is now being tested directly against the core mechanisms that secure decentralized finance.
The benchmark could influence how protocols are audited, how risks are priced, and how regulators assess AI’s role in financial infrastructure.
As AI systems continue to improve at writing and analyzing code, the question is no longer whether they will reshape crypto security.
And with billions of dollars locked in smart contracts, the outcome may determine the next era of trust in decentralized finance.