CHECKLIST / SECURITY

Back to writing section

AI Agent Security Checklist (2026): Best Practices for Building Safe AI Agents

Learn how to secure AI agents against prompt injection, unauthorized actions, wallet compromise, and unsafe execution using practical security principles and implementation-agnostic best practices.

AI agents introduce a different security model from traditional software. They consume untrusted information, make decisions, interact with external systems, and are capable of moving funds and performing sensitive actions.

The objective is not to make the model perfect. The objective is to ensure that any action taken by the agent remains within defined security boundaries, even when the model encounters malicious inputs, makes mistakes, or behaves unexpectedly.

In this guide, we'll cover practical security principles and best practices for building AI agents that interact with wallets, APIs, external tools, and production infrastructure. The recommendations are implementation-agnostic and intended to help developers build safer, more resilient agent systems.

See the Key Concepts section for glossary definitions of technical terms used throughout this checklist.

Common Pitfalls

  • Indirect prompt injection: Malicious instructions hidden in web pages, NFT metadata, documents, token URIs, or social content.
  • Approval drains: Granting excessive permissions that can later be abused.
  • Overly trusting outputs: Treating model responses as executable instructions instead of data.
  • Permission creep: Automatically expanding an agent's capabilities after new assets or integrations are introduced.

Always assume both external data and the model itself can be wrong.

1. Input Security

Every external input should be treated as untrusted.

Validate External Inputs

  • Treat websites, emails, documents, messages, tool outputs, and social media as untrusted.
  • Validate and sanitize external content before it reaches execution workflows.
  • Restrict which sources the agent is allowed to consume.
  • Review new integrations before granting production access.

Prompt injection often originates from content that appears harmless but contains hidden instructions.

Detect Obfuscation

Detect common obfuscation techniques before allowing content to influence downstream workflows, and escalate suspicious inputs for additional validation when appropriate. Organizations should establish review procedures for content that cannot be confidently interpreted.

Separate Data From Instructions

  • Treat retrieved content as data, not commands.
  • Prevent external content from introducing new execution instructions.
  • Avoid architectures where untrusted content directly influences control flow.

2. Identity and Authorization

Agents should only perform actions that have been explicitly authorized.

Verify Action Sources

Only accept sensitive actions from trusted sources such as:

  • Authenticated users
  • Signed requests
  • Approved APIs
  • Internal services

Do not accept sensitive actions from:

  • Public social media replies
  • Unverified messages
  • External content feeds
  • NFT metadata
  • Arbitrary web content

Use Cryptographic Authorization

  • Verify signatures where possible.
  • Use scoped API credentials.
  • Use short-lived session tokens.
  • Require explicit approval for permission changes.

The system should always know who authorized an action and under what conditions.

Apply Least Privilege

  • Grant only the permissions required for each task.
  • Separate read-only permissions from execution permissions.
  • Limit access to wallets, APIs, databases, and infrastructure.
  • Periodically review granted permissions.

Least privilege significantly reduces the impact of a compromise.

For example, an agent that only needs to read data should not be able to execute transactions, modify infrastructure, or grant new permissions.

3. Action Generation

Agent reasoning alone should never directly trigger execution.

Security Principle: Reasoning Is Not Execution

Language models are useful for interpreting intent, planning actions, and selecting tools. They should not have unilateral authority to execute transactions, modify permissions, or access signing infrastructure.

Execution should occur only after validation, authorization, policy enforcement, and any required approvals have completed.

Use Structured Actions

Convert natural-language requests into structured actions before execution so downstream systems can independently validate parameters.

Validate Parameters

Verify:

  • Recipient addresses
  • Token contracts
  • Chain identifiers
  • Transaction values
  • Approval amounts

The execution layer should never rely solely on model-generated text.

Separate Reasoning From Execution

Keep reasoning isolated from execution. The model may recommend actions, but execution should occur through independently validated and authorized systems.

4. Policy Enforcement

Every action should pass through policy controls before execution.

Enforce Spending Limits

  • Set maximum transaction amounts.
  • Set daily or weekly transfer limits.
  • Limit approval sizes.

Where supported, enforce these controls at the wallet layer using mechanisms such as EIP-4337 rather than relying solely on the AI agent.

Restrict Recipients

  • Maintain allowlists where appropriate.
  • Require additional review for new recipients.
  • Restrict interactions with unknown contracts.

Restrict Permission Changes

  • Require explicit approval for new permissions.
  • Require approval before increasing limits.
  • Require approval before granting new capabilities.

Receiving an asset should never automatically expand an agent's permissions.

Fail Closed

If validation, authorization, simulation, or policy checks fail, stop execution. When uncertainty exists, doing nothing is generally safer than attempting to guess user intent.

5. Transaction Simulation

Blockchain transactions should be simulated before execution whenever practical.

Simulate Transactions

Before signing, simulate the transaction and verify that the expected outcome matches the user's intent.

Compare Expected Outcomes

Review for:

  • Asset movements
  • Approval changes
  • Contract interactions

Stop execution if unexpected approvals, transfers, or contract interactions are detected.

Simulation provides one final opportunity to detect unintended or malicious behavior before funds move. See this guide on simulating transactions for implementation details.

6. Execution Security

Execution systems should remain tightly controlled.

Require Human Approval For High-Risk Actions

Require additional approval for:

  • Large transfers
  • New recipients
  • Permission changes
  • Contract upgrades
  • Administrative actions

Isolate Sensitive Components

Separate:

  • Model inference
  • Wallet infrastructure
  • Secret management
  • Transaction signing

A compromise in one component should not automatically compromise the others.

Protect Secrets

  • Store secrets in dedicated vaults.
  • Rotate credentials regularly.
  • Never expose secrets to the model context.
  • Limit secret access to required services only.

For high-value systems, consider MPC (Multi-Party Computation) or hardware-backed signing to reduce the risk associated with a single exposed private key.

7. Monitoring and Response

Security controls are incomplete without visibility.

Monitor Agent Activity

Track:

  • Transaction frequency and size
  • New recipients
  • Failed policy checks
  • Permission changes
  • Authentication events

Generate Alerts

Alert on:

  • Unusual transaction patterns
  • New destination addresses
  • Policy violations
  • Excessive execution attempts
  • Unexpected permission requests

Maintain Audit Logs

Record:

  • User requests
  • Agent decisions
  • Policy evaluations
  • Simulation results
  • Executed actions

Comprehensive audit logs simplify incident response and postmortem analysis.

Emergency Controls

Maintain tested emergency procedures that can rapidly contain or disable compromised agents.

Quick Builder Checklist

MVP

  • ☐ Treat all external inputs as untrusted
  • ☐ Verify action sources
  • ☐ Use structured actions
  • ☐ Separate reasoning from execution
  • ☐ Enforce policy controls with fail-closed behavior
  • ☐ Configure spending limits and recipient restrictions
  • ☐ Simulate transactions before execution
  • ☐ Isolate secrets from model context

Recommended

  • ☐ Detect obfuscation attempts
  • ☐ Apply least privilege
  • ☐ Define human approval thresholds
  • ☐ Enable monitoring and alerting
  • ☐ Maintain audit logs

Advanced

  • ☐ Test emergency shutdown procedures
  • ☐ Use hardware-backed or distributed key management
  • ☐ Enforce wallet-level policy controls

Appendix: Key Concepts

The following terms are referenced throughout this checklist and provide additional context for common AI agent security concepts.

Account Abstraction (EIP-4337)

EIP-4337 enables smart contract wallets with programmable security policies. Organizations can enforce spending limits, session keys, recipient restrictions, and other controls at the wallet layer rather than relying solely on the AI agent.

Multi-Party Computation (MPC)

MPC splits signing authority across multiple parties instead of relying on a single private key. Organizations can require multiple approvals for sensitive actions while keeping key material distributed.

Fail Closed

If validation fails or uncertainty exists, stop execution rather than attempting to continue.

Least Privilege

Every user, process, application, or agent should have only the minimum permissions necessary to perform its intended task, and nothing more.

Structured Actions

Convert natural-language requests into structured representations so validation and policy enforcement can occur before execution.

Transaction Simulation

Execute transactions in a safe environment before signing to verify expected behavior and detect unintended state changes.

Prompt Injection

Prompt injection occurs when malicious content attempts to manipulate an AI system into ignoring its intended instructions or security boundaries.

Conclusion

AI agents are becoming increasingly capable, but greater autonomy also brings greater responsibility. Security should not be treated as something added after deployment. It should be part of the design process from the beginning.

No checklist can eliminate every risk, and no AI agent system is ever going to be 100% secure. New attack techniques will continue to emerge, models will evolve, and best practices will change over time. The goal is not perfection. The goal is to build systems that fail safely, limit the impact of mistakes, and make risky actions difficult to execute.

Most of the practices in this checklist are not unique to AI. They are well-established security principles applied to a new class of software. As agents become more capable and trusted with increasingly sensitive tasks, applying those principles consistently will become even more important.

Innovation and security do not have to compete with each other. Building useful AI agents and building secure AI agents should go hand in hand. The earlier security is considered, the easier it becomes to build systems that users can trust.

Acknowledgements

This research was supported in part by funding from TheDAO Security Fund & Giveth. All views and conclusions expressed are my own.