AI Agent Security Checklist (2026): Best Practices for Building Safe AI Agents
Learn how to secure AI agents against prompt injection, unauthorized actions, wallet compromise, and unsafe execution using practical security principles and implementation-agnostic best practices.
AI agents introduce a different security model from traditional software. They consume untrusted information, make decisions, interact with external systems, and are capable of moving funds and performing sensitive actions.
The objective is not to make the model perfect. The objective is to ensure that any action taken by the agent remains within defined security boundaries, even when the model encounters malicious inputs, makes mistakes, or behaves unexpectedly.
In this guide, we'll cover practical security principles and best practices for building AI agents that interact with wallets, APIs, external tools, and production infrastructure. The recommendations are implementation-agnostic and intended to help developers build safer, more resilient agent systems.
See the Key Concepts section for glossary definitions of technical terms used throughout this checklist.
Common Pitfalls
- Indirect prompt injection: Malicious instructions hidden in web pages, NFT metadata, documents, token URIs, or social content.
- Approval drains: Granting excessive permissions that can later be abused.
- Overly trusting outputs: Treating model responses as executable instructions instead of data.
- Permission creep: Automatically expanding an agent's capabilities after new assets or integrations are introduced.
Always assume both external data and the model itself can be wrong.
1. Input Security
Every external input should be treated as untrusted.
Validate External Inputs
- Treat websites, emails, documents, messages, tool outputs, and social media as untrusted.
- Validate and sanitize external content before it reaches execution workflows.
- Restrict which sources the agent is allowed to consume.
- Review new integrations before granting production access.
Prompt injection often originates from content that appears harmless but contains hidden instructions.
Detect Obfuscation
Detect common obfuscation techniques before allowing content to influence downstream workflows, and escalate suspicious inputs for additional validation when appropriate. Organizations should establish review procedures for content that cannot be confidently interpreted.
Separate Data From Instructions
- Treat retrieved content as data, not commands.
- Prevent external content from introducing new execution instructions.
- Avoid architectures where untrusted content directly influences control flow.
2. Identity and Authorization
Agents should only perform actions that have been explicitly authorized.
Verify Action Sources
Only accept sensitive actions from trusted sources such as:
- Authenticated users
- Signed requests
- Approved APIs
- Internal services
Do not accept sensitive actions from:
- Public social media replies
- Unverified messages
- External content feeds
- NFT metadata
- Arbitrary web content
Use Cryptographic Authorization
- Verify signatures where possible.
- Use scoped API credentials.
- Use short-lived session tokens.
- Require explicit approval for permission changes.
The system should always know who authorized an action and under what conditions.
Apply Least Privilege
- Grant only the permissions required for each task.
- Separate read-only permissions from execution permissions.
- Limit access to wallets, APIs, databases, and infrastructure.
- Periodically review granted permissions.
Least privilege significantly reduces the impact of a compromise.
For example, an agent that only needs to read data should not be able to execute transactions, modify infrastructure, or grant new permissions.
3. Action Generation
Agent reasoning alone should never directly trigger execution.
Security Principle: Reasoning Is Not Execution
Language models are useful for interpreting intent, planning actions, and selecting tools. They should not have unilateral authority to execute transactions, modify permissions, or access signing infrastructure.
Execution should occur only after validation, authorization, policy enforcement, and any required approvals have completed.
Use Structured Actions
Convert natural-language requests into structured actions before execution so downstream systems can independently validate parameters.
Validate Parameters
Verify:
- Recipient addresses
- Token contracts
- Chain identifiers
- Transaction values
- Approval amounts
The execution layer should never rely solely on model-generated text.
Separate Reasoning From Execution
Keep reasoning isolated from execution. The model may recommend actions, but execution should occur through independently validated and authorized systems.
4. Policy Enforcement
Every action should pass through policy controls before execution.
Enforce Spending Limits
- Set maximum transaction amounts.
- Set daily or weekly transfer limits.
- Limit approval sizes.
Where supported, enforce these controls at the wallet layer using mechanisms such as EIP-4337 rather than relying solely on the AI agent.
Restrict Recipients
- Maintain allowlists where appropriate.
- Require additional review for new recipients.
- Restrict interactions with unknown contracts.
Restrict Permission Changes
- Require explicit approval for new permissions.
- Require approval before increasing limits.
- Require approval before granting new capabilities.
Receiving an asset should never automatically expand an agent's permissions.
Fail Closed
If validation, authorization, simulation, or policy checks fail, stop execution. When uncertainty exists, doing nothing is generally safer than attempting to guess user intent.
5. Transaction Simulation
Blockchain transactions should be simulated before execution whenever practical.
Simulate Transactions
Before signing, simulate the transaction and verify that the expected outcome matches the user's intent.
Compare Expected Outcomes
Review for:
- Asset movements
- Approval changes
- Contract interactions
Stop execution if unexpected approvals, transfers, or contract interactions are detected.
Simulation provides one final opportunity to detect unintended or malicious behavior before funds move. See this guide on simulating transactions for implementation details.
6. Execution Security
Execution systems should remain tightly controlled.
Require Human Approval For High-Risk Actions
Require additional approval for:
- Large transfers
- New recipients
- Permission changes
- Contract upgrades
- Administrative actions
Isolate Sensitive Components
Separate:
- Model inference
- Wallet infrastructure
- Secret management
- Transaction signing
A compromise in one component should not automatically compromise the others.
Protect Secrets
- Store secrets in dedicated vaults.
- Rotate credentials regularly.
- Never expose secrets to the model context.
- Limit secret access to required services only.
For high-value systems, consider MPC (Multi-Party Computation) or hardware-backed signing to reduce the risk associated with a single exposed private key.
7. Monitoring and Response
Security controls are incomplete without visibility.
Monitor Agent Activity
Track:
- Transaction frequency and size
- New recipients
- Failed policy checks
- Permission changes
- Authentication events
Generate Alerts
Alert on:
- Unusual transaction patterns
- New destination addresses
- Policy violations
- Excessive execution attempts
- Unexpected permission requests
Maintain Audit Logs
Record:
- User requests
- Agent decisions
- Policy evaluations
- Simulation results
- Executed actions
Comprehensive audit logs simplify incident response and postmortem analysis.
Emergency Controls
Maintain tested emergency procedures that can rapidly contain or disable compromised agents.
Quick Builder Checklist
MVP
- ☐ Treat all external inputs as untrusted
- ☐ Verify action sources
- ☐ Use structured actions
- ☐ Separate reasoning from execution
- ☐ Enforce policy controls with fail-closed behavior
- ☐ Configure spending limits and recipient restrictions
- ☐ Simulate transactions before execution
- ☐ Isolate secrets from model context
Recommended
- ☐ Detect obfuscation attempts
- ☐ Apply least privilege
- ☐ Define human approval thresholds
- ☐ Enable monitoring and alerting
- ☐ Maintain audit logs
Advanced
- ☐ Test emergency shutdown procedures
- ☐ Use hardware-backed or distributed key management
- ☐ Enforce wallet-level policy controls
Appendix: Key Concepts
The following terms are referenced throughout this checklist and provide additional context for common AI agent security concepts.
Account Abstraction (EIP-4337)
EIP-4337 enables smart contract wallets with programmable security policies. Organizations can enforce spending limits, session keys, recipient restrictions, and other controls at the wallet layer rather than relying solely on the AI agent.
Multi-Party Computation (MPC)
MPC splits signing authority across multiple parties instead of relying on a single private key. Organizations can require multiple approvals for sensitive actions while keeping key material distributed.
Fail Closed
If validation fails or uncertainty exists, stop execution rather than attempting to continue.
Least Privilege
Every user, process, application, or agent should have only the minimum permissions necessary to perform its intended task, and nothing more.
Structured Actions
Convert natural-language requests into structured representations so validation and policy enforcement can occur before execution.
Transaction Simulation
Execute transactions in a safe environment before signing to verify expected behavior and detect unintended state changes.
Prompt Injection
Prompt injection occurs when malicious content attempts to manipulate an AI system into ignoring its intended instructions or security boundaries.
Conclusion
AI agents are becoming increasingly capable, but greater autonomy also brings greater responsibility. Security should not be treated as something added after deployment. It should be part of the design process from the beginning.
No checklist can eliminate every risk, and no AI agent system is ever going to be 100% secure. New attack techniques will continue to emerge, models will evolve, and best practices will change over time. The goal is not perfection. The goal is to build systems that fail safely, limit the impact of mistakes, and make risky actions difficult to execute.
Most of the practices in this checklist are not unique to AI. They are well-established security principles applied to a new class of software. As agents become more capable and trusted with increasingly sensitive tasks, applying those principles consistently will become even more important.
Innovation and security do not have to compete with each other. Building useful AI agents and building secure AI agents should go hand in hand. The earlier security is considered, the easier it becomes to build systems that users can trust.
Acknowledgements
This research was supported in part by funding from TheDAO Security Fund & Giveth. All views and conclusions expressed are my own.