Guide Contents
The 7 Layers of AI Agent Technology Stack
A production AI agent stack is multi-layered. Starting from the bottom (infrastructure) and moving up (user-facing layer), here are the essential components:
Agent Frameworks: The Core of Your Stack
The orchestration framework is your most critical technology choice. Here's the 2026 landscape:
Recommendation for 2026: If building custom agents on Claude, start with Anthropic's Agent SDK. If you need graph-based state management and debuggability, choose LangGraph. For team-based multi-agent systems, CrewAI has momentum. Most enterprises choose LangGraph or custom implementations because governance requirements are non-negotiable.
Language Models: The Brain Layer
Your LLM choice defines agent capability, cost, and latency. 2026 options:
Closed-Source (API Access)
Open-Source (Self-Hosted)
- Llama 3.1 (70B/405B): Best open-source reasoning. Deploy on your infrastructure. Zero API costs at scale.
- Mixtral 8x22B: Mixture of Experts. Lower latency than Llama, strong tool use. Great for customer service agents.
- Phi-3/3.5: Lightweight, runs on CPUs. Best for resource-constrained environments (edge, devices).
Cost-Benefit Analysis (annual, 10M requests): GPT-4o: ~$750K, Claude Sonnet: ~$180K, Gemini: ~$75K, Self-hosted Llama 405B: ~$400K infrastructure but zero API costs. Most enterprises start with API (managed, reliable) then move to self-hosted as volume grows and requirements get clear.
Orchestration: Agent State & Flow Control
This is where your agent "thinks." Key concepts:
State Management
Agents maintain state across turns: conversation history, user context, task progress, tool results. Options:
- In-Memory: Fast but loses state on restart. For stateless APIs.
- Database-Backed: PostgreSQL with JSON storage. Durable, queryable.
- Checkpointing (LangGraph): Snapshot state at each step. Allows resume, rollback, debugging.
Agent Loop Patterns
Key optimization: Define clear termination conditions. Agents should know when they've succeeded or when to escalate. Infinite loops = wasted compute.
Memory & Context: Vector Databases & RAG
Agents need access to knowledge. Two memory patterns:
Short-Term Memory (Conversation History)
Stored in agent state. Latest 10–50 messages included in prompt. Lightweight, sufficient for single-session interactions.
Long-Term Memory (Knowledge Bases)
Semantic search across documents. Flow: text → embeddings → vector DB → retrieval → context injection.
Pro tip: Don't over-engineer embeddings. Start with OpenAI's text-embedding-3-small ($0.02 per 1M tokens). Switch to specialized models only if accuracy drops.
Tools & Integration Layer: Making Agents Useful
An agent without tools is just a chatbot. Tools enable action: API calls, database queries, file operations, external service integration.
Protocol: Model Context Protocol (MCP)
MCP is the standard emerging in 2026 for tool definition and discovery. Created by Anthropic, adopted by OpenAI, Microsoft, and now Linux Foundation's Agentic AI Foundation (Dec 2025). MCP enables:
- Standard way for agents to discover available tools
- Safe execution sandboxing
- Standardized error handling
- Tool marketplace / shared libraries
Common Tool Categories
Observability & Governance: Safety & Compliance
Production agents need guardrails. Key components:
Observability (Tracing & Debugging)
- LangSmith / Langfuse: Agent execution tracing, debugging UI. Catch failures before customers do.
- OpenTelemetry: Standard tracing format. Export to your observability stack (Datadog, New Relic).
- Logging: Structured logs (JSON) with timestamps, decision points, tool calls, errors.
Guardrails & Safety
- Input Validation: Sanitize user input, block prompt injection attempts.
- Tool Allowlists: Agents can only call approved tools for their role.
- Output Filtering: Detect and block sensitive data leakage (PII, secrets).
- Rate Limiting: Prevent abuse: max N requests/minute per user/organization.
- Cost Controls: Cap LLM spend per agent/month to avoid runaway API costs.
Compliance & Audit
For regulated industries (healthcare, finance, legal):
- Audit Trails: Every agent decision logged with timestamp, user, decision path, tool calls.
- Data Governance: Encryption in transit/rest, GDPR right-to-deletion, data residency controls.
- SOC 2 / HIPAA / GDPR: Compliance framework built into infrastructure and logging.
- Human-in-the-Loop: Critical decisions (financial, medical) require human approval.
Stack Recommendations by Use Case
Customer Service AI Agent
Framework: LangGraph (clear flow control, state management)
Memory: pgvector (FAQ embeddings) + Zendesk API for ticket context
Tools: Zendesk ticket API, knowledge base search, escalation trigger
Observability: LangSmith + structured logs to data warehouse
Estimated Cost (annual, 1M interactions): $120K–200K
Sales Lead Qualification Agent
Framework: Anthropic Agent SDK or LangGraph
Memory: Weaviate (company research data, lead profiles)
Tools: Salesforce API, LinkedIn API, company enrichment APIs
Integration: Send qualified leads to Hubspot pipeline
Estimated Cost (annual, 500K leads scored): $80K–150K
Content Generation Agent
Framework: Simple LLM API calls + orchestration layer
Memory: Brand guidelines in prompt, previous content in context
Tools: Google Docs API, image generation (DALL-E), Grammarly API
Storage: Content versioning in Git/semantic versioning
Estimated Cost (annual, 10K pieces): $40K–80K
Data Analysis Agent
Framework: LangGraph with Python execution sandbox
Memory: Table schemas, previous queries, column documentation
Tools: SQL execution (read-only), BigQuery/Snowflake API, visualization APIs
Governance: Query approval layer, cost controls on expensive queries
Estimated Cost (annual, 500K queries): $150K–300K + compute
Ready to Build Your AI Agent Stack?
Compare AI agents with built-in stack guidance, integration capabilities, and supported frameworks.