Reference — Plain-English AI Terminology

AI Agent Glossary

Clear definitions of the key terms you'll encounter when evaluating, buying, and deploying AI agents — written for IT buyers and procurement professionals, not AI researchers.

A
AI Agent
Software that uses artificial intelligence to autonomously complete tasks, make decisions, and interact with users or systems without requiring step-by-step human instruction. Unlike basic AI tools, agents can plan multi-step workflows, use external tools, and take real actions in connected software systems.
Usage: "We deployed an AI agent to handle tier-1 support tickets without human involvement."
CoreBuyer essential
Agentic AI
A class of AI systems designed to act autonomously over extended periods, chaining together multiple reasoning and action steps to accomplish complex goals. Distinguished from chatbots by their ability to plan, use tools, and take actions without per-step human prompting.
Architecture
API (Application Programming Interface)
A set of rules and protocols that allow software applications to communicate with each other. AI agents use APIs to connect to external tools — CRMs, databases, calendars, ticketing systems. The breadth and quality of an agent's API integrations is a key differentiator in enterprise evaluations.
TechnicalIntegration
Autonomous Execution
The ability of an AI system to complete tasks from start to finish without human input at each step. Fully autonomous agents (like Devin for software engineering) can take a task description, plan a series of actions, execute them, handle errors, and deliver a result — all without human intervention.
Capability
B
BAA (Business Associate Agreement)
A legal contract required under HIPAA between healthcare organisations and vendors that handle protected health information (PHI). Any AI agent processing patient data or healthcare records in the US requires a signed BAA. Always verify BAA availability before deploying AI in healthcare contexts.
ComplianceHealthcare
Benchmark
A standardised test used to evaluate and compare AI model performance. Common AI benchmarks include MMLU (general reasoning), HumanEval (coding), and SWE-Bench (software engineering tasks). While useful for comparison, benchmark scores don't always correlate with real-world performance on your specific tasks — always test on your actual use cases.
Evaluation
C
Context Window
The maximum amount of text (measured in tokens) that an AI model can process at once. A larger context window allows an agent to consider more information — for example, an entire codebase, a long document, or an extended conversation history. GPT-4 has a 128K token context window; Claude 3.5 Sonnet supports up to 200K tokens.
Why it matters: Coding agents with larger context windows can reason about an entire codebase, not just individual files.
TechnicalBuyer essential
Copilot
A class of AI assistant designed to augment human work rather than replace it — the AI works alongside the user, suggesting completions, generating drafts, or highlighting relevant information, but the human remains in control of decisions. GitHub Copilot is the archetypal example. Contrasts with fully autonomous agents that act independently.
Category
Chain-of-Thought (CoT)
A prompting technique that instructs an AI to reason through a problem step by step before producing a final answer. Chain-of-thought reasoning significantly improves performance on complex tasks requiring multi-step logic. Many agentic AI systems use CoT internally even when not visible to the user.
Technical
CSAT (Customer Satisfaction Score)
A metric used to measure customer satisfaction with a specific interaction, typically scored 1–5 or 1–10. Key metric for evaluating customer service AI agents — a well-deployed AI agent should maintain or improve CSAT scores relative to human agents, particularly for routine query resolution.
MetricCustomer Service
D
DPA (Data Processing Agreement)
A legally binding contract between a data controller (your organisation) and a data processor (the AI vendor) that defines how personal data will be handled, protected, and deleted. Required under GDPR for any vendor processing EU personal data. Always request and review a DPA before deploying AI tools with access to personal data.
LegalGDPR
Deep Research
A capability offered by several AI research agents (Perplexity, OpenAI) that autonomously gathers, synthesises, and cites information from multiple sources to produce comprehensive research reports. The agent plans its research approach, queries multiple sources, and compiles findings — tasks that would take a human researcher several hours.
FeatureResearch
E
Embedding
A numerical representation of text (or other data) that captures its semantic meaning. AI systems use embeddings to understand relationships between concepts. Embeddings power similarity search — for example, allowing a customer service agent to find the most relevant support article for a given query, even if the exact words don't match.
Technical
Enterprise Tier
The highest pricing tier for AI software products, typically requiring annual contract negotiation. Enterprise tiers usually unlock: SSO/SAML, advanced audit logging, data processing agreements, SLA guarantees, dedicated customer success, on-premise or private cloud options, and volume pricing. Often priced 2–5x the team/business tier.
PricingBuyer essential
F
Fine-tuning
The process of further training a pre-trained AI model on a specific dataset to improve its performance on a particular task or domain. An enterprise might fine-tune a general LLM on their internal knowledge base, product documentation, or historical support tickets to create a more accurate, on-brand AI agent. Fine-tuning is distinct from RAG — it changes the model's weights; RAG retrieves external information at inference time.
TechnicalCustomisation
Function Calling
A capability that allows an LLM to identify when an external function (tool, API, database query) should be called, specify the required parameters, and integrate the result into its response. Function calling is the mechanism that enables AI agents to take real actions — searching the web, querying a database, updating a CRM record, or executing code.
TechnicalCore
H
Hallucination
When an AI model generates content that sounds plausible but is factually incorrect or fabricated — for example, citing a non-existent study, inventing a product specification, or providing wrong code that compiles but doesn't work correctly. Hallucination is a key risk to evaluate in any AI agent, particularly for high-stakes outputs (legal, medical, financial, technical documentation). Mitigation strategies include RAG, grounding on verified sources, and human review workflows.
RiskBuyer essential
HIPAA (Health Insurance Portability and Accountability Act)
US federal legislation that establishes standards for protecting sensitive patient health information. Any AI agent that processes, stores, or transmits protected health information (PHI) must comply with HIPAA. Requires a signed Business Associate Agreement (BAA) with the vendor. Always verify HIPAA compliance explicitly — "SOC 2 certified" does not automatically imply HIPAA compliance.
ComplianceHealthcare
I
Inference
The process of running a trained AI model to generate a response or prediction. When you send a prompt to an AI agent, inference is what happens — the model processes your input and generates output. Inference speed and cost are key operational factors in usage-based AI agent pricing.
Technical
K
Knowledge Base
A curated collection of structured or unstructured information that an AI agent can search and reference. Customer service agents are typically connected to a knowledge base of support articles, product documentation, and FAQs — they search this base to find relevant answers before generating responses. Knowledge base quality directly impacts agent accuracy.
FeatureCustomer Service
L
LLM (Large Language Model)
The AI model at the core of most modern AI agents. An LLM is a neural network trained on vast amounts of text data that can understand and generate human language. Common LLMs include GPT-4o (OpenAI), Claude 3.5 (Anthropic), Gemini 1.5 (Google), and Llama 3 (Meta). The LLM provides the reasoning and language capability; the agent framework adds memory, tools, and action capabilities on top.
CoreBuyer essential
Latency
The time between sending a request to an AI agent and receiving a response. Critical for real-time use cases like customer service chat and voice agents where users expect near-instant responses. Measured in milliseconds (ms) or seconds. Agentic tasks involving multiple tool calls can take 5–60 seconds — acceptable for asynchronous tasks, problematic for conversational interfaces.
PerformanceMetric
M
MCP (Model Context Protocol)
An open standard developed by Anthropic for connecting AI models to external tools, data sources, and services. MCP provides a standardised way for AI agents to interface with APIs, databases, and software systems — similar to how USB standardised hardware connections. Increasingly adopted across the AI agent ecosystem in 2025.
ProtocolIntegration
Multimodal AI
AI systems that can process and generate multiple types of data — text, images, audio, video, and code. Multimodal agents can, for example, analyse a screenshot and generate code to reproduce a UI, or review a chart image and produce a written analysis. GPT-4o and Claude 3.5 Sonnet are both multimodal models.
Capability
N
NLP (Natural Language Processing)
The branch of AI focused on enabling computers to understand, interpret, and generate human language. Modern AI agents are built on advanced NLP — specifically large language models. Earlier generation "NLP bots" used rule-based intent classification; modern agents use transformer-based LLMs that understand nuance, context, and ambiguity.
Technical
O
Orchestration
The coordination of multiple AI agents or AI tools to complete a complex workflow. An orchestration layer manages which agent handles which task, passes information between agents, and assembles the final output. Multi-agent orchestration frameworks (like LangGraph, CrewAI, or AutoGen) allow developers to build sophisticated AI workflows by chaining specialised agents.
ArchitectureEnterprise
Outcome-Based Pricing
A pricing model where you pay per successful outcome — per resolved ticket, per booked meeting, or per completed task — rather than per seat or per usage. Common in customer service AI. Aligns vendor incentives with your success but requires precise outcome definition to avoid disputes.
PricingBuyer essential
P
POC (Proof of Concept)
A structured pilot of an AI agent in your environment to validate that it works as promised before full deployment or annual contract commitment. A well-designed POC includes a representative task sample, baseline measurements, defined success criteria, and a representative user group. Typically runs 30–90 days.
ProcurementBuyer essential
Prompt
The input given to an AI model to instruct it on what to do. A prompt can be a simple question, a detailed set of instructions, or a structured template. "Prompt engineering" is the practice of crafting prompts that elicit better AI responses. Many enterprise AI agents include a "system prompt" that sets the agent's behaviour, persona, and constraints for all users.
Core
R
RAG (Retrieval-Augmented Generation)
A technique that combines information retrieval with text generation. Rather than relying solely on what the model learned during training, a RAG system searches an external knowledge base for relevant information and includes it in the prompt before generating a response. RAG significantly reduces hallucination and allows AI agents to answer questions about current events, proprietary data, or documents not in the training set.
Example: A customer service AI using RAG retrieves the relevant help article before composing its response, grounding the answer in verified documentation.
ArchitectureBuyer essential
Resolution Rate
In customer service AI, the percentage of customer inquiries that are fully resolved by the AI agent without human escalation. A key metric for evaluating customer service agents. Industry benchmarks in 2025 range from 30–70% depending on query complexity and knowledge base quality. Higher resolution rates directly reduce support costs.
MetricCustomer Service
S
SOC 2 (Service Organization Control 2)
A security certification standard developed by the AICPA that verifies a service organisation's controls around security, availability, processing integrity, confidentiality, and privacy. SOC 2 Type II (the stronger version) confirms these controls have been in place and working over a period of time (typically 6–12 months). A baseline requirement for enterprise AI agent deployment.
SecurityBuyer essential
SSO / SAML (Single Sign-On / Security Assertion Markup Language)
Authentication standards that allow users to log into AI tools using their existing corporate identity provider (Okta, Azure Active Directory, Google Workspace) rather than creating separate passwords. SSO is a standard enterprise security requirement — it enables centralised access management, session control, and immediate deprovisioning when an employee leaves. Often gated behind Enterprise pricing tiers.
SecurityEnterprise
System Prompt
A set of instructions provided to an AI agent before the user interaction begins, which shapes the agent's behaviour, persona, capabilities, and constraints. Enterprise deployments use system prompts to set tone and brand guidelines, restrict topics the agent will discuss, define its role, and inject relevant business context. System prompts are typically not visible to end users.
TechnicalConfiguration
T
Token
The unit of text that AI models process. A token is approximately 4 characters or 0.75 words in English. "Hello world" is 2 tokens. Most AI APIs price by token — both input (your prompt) and output (the AI's response). Context window sizes, pricing, and model limits are all expressed in tokens. 1,000 tokens ≈ 750 words.
TechnicalPricing
TCO (Total Cost of Ownership)
The full cost of deploying and operating an AI agent over its lifetime, including subscription fees, integration and setup costs, training and onboarding, ongoing maintenance, compliance add-ons, premium support, and renewal price escalations. TCO analysis typically reveals that the headline subscription price represents only 40–70% of actual total cost.
PricingProcurement
Tool Use
The ability of an AI agent to invoke external tools — web search, code execution, database queries, API calls, file reading — to gather information or take actions beyond its training data. Tool use is what transforms an AI language model into a true agent. The range and reliability of available tools is a key differentiator between agent platforms.
CoreArchitecture
U
Uptime SLA
A contractual guarantee of minimum service availability, expressed as a percentage (e.g., 99.9% = maximum ~8.7 hours downtime per year; 99.99% = ~52 minutes). Critical for any AI agent integrated into production workflows. Enterprise tiers typically offer stronger SLAs. Always clarify what "uptime" covers — some vendors exclude planned maintenance windows or only count API availability, not full product functionality.
ContractEnterprise
Usage-Based Pricing
A pricing model where you pay based on consumption — typically per API call, per token, per message, or per task. Common in AI APIs and platforms. Advantageous for unpredictable or low-volume use; can become expensive at high volume. Always model costs at P95 (95th percentile) usage, not average, to avoid budget surprises.
PricingBuyer essential
V
Vector Database
A database optimised for storing and searching embeddings (vector representations of text and other data). RAG systems use vector databases to find the most semantically relevant documents for a given query. Common vector databases include Pinecone, Weaviate, and pgvector. Enterprise AI deployments often use vector databases to ground agents in company-specific knowledge.
TechnicalInfrastructure
Z
Zero-Shot
The ability of an AI model to perform a task without being given any examples of how to do it — relying solely on its training knowledge and the instructions in the prompt. Contrasts with "few-shot" prompting, where 2–5 examples are provided. Modern LLMs like GPT-4o and Claude 3.5 Sonnet demonstrate strong zero-shot performance on many enterprise tasks.
Technical