AI conversations move fast, and the jargon moves faster. If you've ever nodded along in a meeting while privately Googling "what is a vector database," this is for you.
These are the terms that come up in AI product work, explained in plain English, with the context that actually matters when you're the one making the call.
How to use this guide
- Terms run A-Z. Use the letter nav to jump to what you need.
- Each entry has a plain definition, a real-world analogy, and a product manager takeaway.
- No maths. No code. Just enough to be dangerous in the right way.
A
Emerging
AI Agent
Also called: agentic AI, autonomous agent
An AI system that doesn't just respond to a single prompt. It takes a sequence of actions to accomplish a goal. Agents can use tools (web search, code execution, sending emails), plan multi-step workflows, and make decisions along the way. Instead of one question and one answer, an agent might take ten steps to complete a task you described at a high level.
Analogy
The difference between asking a colleague "what's our Q3 revenue?" (one question, one answer) and "prepare the Q3 board pack" (pull data, build slides, check with finance, send for review). An agent handles the second kind of request.
Product Manager Takeaway
Agentic products need different UX thinking. When does the agent ask for permission? What happens when it fails mid-task? How do users understand and trust what the agent did? Product problems as much as technical ones.
B
Watch Out
Bias (AI Bias)
Also called: algorithmic bias, model bias, training bias
Systematic skew in a model's outputs, caused by patterns in its training data, the way it was built, or the way it's deployed. Models learn from human-generated data, which contains human biases. The model can inherit and amplify those biases at scale. It shows up in product work as personas that default to demographic assumptions, recommendation systems that reinforce existing patterns, hiring tools that favour certain groups.
Analogy
If you only ever interviewed customers at your London office, your "user insights" would skew towards a specific demographic. AI bias works the same way. The model's worldview is limited to the data it was trained on.
Product Manager Takeaway
Bias isn't a bug you fix once. It's a quality dimension you monitor. Actively ask: whose perspective is over-represented? Whose is missing? Test AI features across demographics, geographies, and usage patterns before launch.
C
Nuanced
Chain-of-Thought (CoT)
Also called: step-by-step reasoning, CoT prompting
A prompting technique where you instruct the model to reason through a problem step by step before giving a final answer, rather than jumping straight to a conclusion. Adding "think step by step" or showing a worked example of reasoning in your prompt consistently improves accuracy on complex tasks, because it forces the model to slow down and check its own logic.
Analogy
Like asking someone to show their working in a maths test. It doesn't just help the marker. It helps the student catch their own mistakes. Writing out the steps reduces errors.
Product Manager Takeaway
For any AI feature where accuracy really matters (analysis, recommendations, decisions), consider prompting for CoT. It adds latency and tokens, but reduces hallucinations on complex tasks. A real quality-speed tradeoff to make consciously.
Core Concept
Context Window
Also called: context length, max tokens
The maximum amount of text an LLM can "see" and consider at one time, including both the input you send (prompt, conversation history, documents) and the output it generates. Modern models range from 8,000 tokens (about 6,000 words) to over a million tokens (roughly a small novel).
Analogy
Like a whiteboard in a meeting room. The model can only work with what's on the whiteboard right now. Once it's full, something has to be erased to add something new.
Product Manager Takeaway
Context window size determines what use cases are viable. Summarising a 100-page document needs a large window. Short chat replies don't. This affects which model you choose, and what you pay.
D
Core Concept
Deterministic vs Probabilistic
Also called: rule-based vs statistical, predictable vs variable
The fundamental distinction between traditional software and AI. Deterministic systems always produce the same output for the same input. A "sort by date" button always sorts by date. Probabilistic systems predict the most likely useful output based on patterns, and that output can vary between runs. AI is probabilistic. It can be brilliant, or it can be confidently wrong, and often you can't tell in advance which.
Analogy
A calculator is deterministic. 2+2 always equals 4. A weather forecast is probabilistic. It predicts the most likely outcome based on patterns, but tomorrow might surprise you. AI works more like the forecast.
Product Manager Takeaway
This shift changes how you write specs, define success metrics, and think about QA. AI features can't be tested with simple pass/fail. They need evaluation frameworks, confidence thresholds, feedback loops. If you understand one concept from AI, make it this one.
E
Core Concept
Embeddings
Also called: vector representations, semantic vectors
A way of representing text (or images, audio, etc.) as a list of numbers that captures meaning. Two pieces of text with similar meaning will have similar numbers, so the system can find related content even if the exact words don't match. Embeddings are what power semantic search, searching by meaning rather than keyword.
Analogy
Imagine plotting every document in your knowledge base on a map. Similar topics cluster together. Embeddings create that map. When a user asks a question, you find the nearest cluster and retrieve those documents.
Product Manager Takeaway
Embeddings are the technology behind "find similar content," semantic search, and RAG. You don't need the maths. Just know it's what makes meaning-based search possible.
Advanced
Evals: Evaluation Framework
Also called: LLM evaluation, model evals
A systematic way of testing the quality of a model's outputs. A test suite for AI behaviour. Evals measure things like: is the answer factually correct? Is it grounded in the source document? Is it harmful? Does it complete the task? A good eval framework runs automatically whenever you change a prompt or upgrade a model, catching regressions before they reach users.
Analogy
QA testing for software, but for judgement. Instead of checking if the button works, you're checking if the AI said something sensible, accurate, and safe, at scale, automatically.
Product Manager Takeaway
Owning evals is becoming a core product manager responsibility. If you can't measure output quality automatically, you can't ship AI features with confidence. Start by defining what "good" looks like, then work with your team to automate that check.
F
Strategic Decision
Fine-Tuning
Also called: model fine-tuning, task-specific training
Taking a pre-trained LLM and training it further on your own data, to make it better at your particular task, domain, or style. Unlike RAG (which retrieves information at query time), fine-tuning bakes knowledge and behaviour into the model itself. It's more expensive, slower to iterate, and needs labelled training data, but can produce a highly specialised model.
Analogy
RAG is giving an employee a reference manual to consult on the job. Fine-tuning is putting them through a six-month specialist training programme. It changes how they think, not just what they look up.
Product Manager Takeaway
Default to RAG first. Only invest in fine-tuning once you have strong task clarity, sufficient labelled data (thousands of examples), and evidence that RAG isn't getting you close enough. Fine-tuning costs roughly 6x more to run than prompting a base model.
G
Foundation
Generative AI
Also called: GenAI, generative models
Models that create new content (text, images, code, audio, video) rather than classifying, predicting, or analysing existing data. ChatGPT, Claude, Midjourney, and GitHub Copilot are all generative AI. The distinction is creative output. A spam filter classifies emails (not generative). A tool that writes email replies creates new content (generative).
Analogy
Traditional AI is a librarian who finds and organises existing books. Generative AI is an author who writes new ones. Both are useful, but the second raises different questions about accuracy, originality, and trust.
Product Manager Takeaway
Most AI features product managers are building today are generative: drafting content, summarising documents, creating recommendations. Generative models are optimised to produce plausible outputs, not verified ones. That distinction drives most of your product design decisions around AI quality.
Core Concept
Guardrails
Also called: safety filters, content filters, output constraints
Rules, filters, and constraints built around a model to control what it will and won't do. Preventing harmful outputs, keeping responses on-topic, enforcing brand and compliance requirements. Guardrails can be implemented at the prompt level (instructions in the system prompt), at the API level (moderation layers), or post-generation (filtering outputs before they reach users).
Analogy
Like bumpers in a bowling lane. The model can still move freely within the lane, but the guardrails stop it going somewhere it shouldn't. They don't make the AI smarter. They make it safer to deploy.
Product Manager Takeaway
Guardrails are a product decision, not just a technical one. What topics should your AI refuse? What tone is off-limits? What happens when a guardrail fires: does the user get a helpful error or a confusing non-answer? UX and risk questions that product managers own.
H
Watch Out
Hallucination
Also called: confabulation, model error
When an AI generates a confident-sounding response that is factually wrong. Inventing statistics, citing non-existent papers, stating false information as if it were true. Happens because LLMs are trained to produce plausible-sounding text, not to verify facts. They don't "know" things the way humans do. They predict probable word sequences.
Analogy
Asking a brilliant but overconfident colleague to fill in a report. They'll give you something that looks great, but they may have made up the bits they didn't know, and they won't flag which is which.
Product Manager Takeaway
Hallucination is the primary quality risk for AI products that present information as fact. Your eval strategy, your RAG setup, and your UI design should all account for it.
Key Metric
Hallucination Rate
Also called: factual error rate, grounding score
The percentage of AI responses that contain factually incorrect or fabricated information. Measuring it requires either human review of a sample of outputs, or an automated "LLM-as-a-judge" approach where another model checks outputs against ground truth. A hallucination rate of even 1-2% can be unacceptable in high-stakes contexts like medical or legal products.
Analogy
If your customer support agent gets the answer wrong 5% of the time and users can't tell which 5%, you have a trust problem, not just a quality problem.
Product Manager Takeaway
Define your acceptable hallucination rate before you launch. In low-stakes creative tools, some error is tolerable. In products where users rely on accuracy, you need grounding (RAG), citation, and human review workflows.
Must Know
Human-in-the-Loop (HITL)
Also called: human review, human oversight, hybrid AI
A design pattern where a human reviews, corrects, or approves AI outputs before they have real-world impact. HITL sits on a spectrum. At one end, every AI output goes to a human before action. At the other, humans only review flagged or low-confidence outputs. The right level depends on the stakes. A typo suggestion needs no review. A credit decision almost certainly does.
Analogy
Tracked changes in a document. The AI drafts the content. A human decides what to accept, reject, or amend before it goes out. AI speed, human judgement.
Product Manager Takeaway
HITL is often the right answer for high-stakes AI features, especially early in a product's life. It builds trust, generates labelled data for improvement, and gives you a safety net while you measure AI accuracy. Design the review workflow as carefully as the AI itself.
I
Core Concept
Inference
Also called: prediction, model inference, serving
The moment a trained model is actually put to work, processing new input and generating an output. Training is the learning phase (expensive, done once or periodically). Inference is the doing phase (cheaper per call, happens every time a user interacts with the model). When someone asks ChatGPT a question, that's inference. Inference cost, latency, and throughput are key operational metrics for any AI product.
Analogy
Training is a student studying for an exam over months. Inference is the moment they sit down and answer the questions. All the learning happened beforehand. Inference is where it gets applied.
Product Manager Takeaway
Inference is what you pay for at scale. Every API call to an LLM is an inference request. Understanding inference cost per query is essential for AI unit economics. Faster, cheaper inference (smaller models, caching, batching) directly impacts your margins.
L
Foundation
LLM: Large Language Model
Also called: foundation model, base model
An AI system trained on enormous amounts of text to understand and generate human language. LLMs like GPT-4, Claude, and Gemini can write, summarise, translate, answer questions, and reason through problems, all by predicting what words should come next based on patterns learned from billions of documents.
Analogy
A very well-read assistant who has absorbed almost everything ever written, and can hold a conversation on virtually any topic. But learned entirely from text, with no lived experience.
Product Manager Takeaway
The LLM is the engine under the hood of most AI features. When you're choosing between OpenAI, Anthropic, or Google, you're choosing which LLM to build on.
Infrastructure
LLMOps
Also called: AI observability, ML operations for LLMs
The practice and tooling for deploying, monitoring, and improving LLM-based applications in production. LLMOps covers tracking every AI request and response (tracing), monitoring cost and latency, running evaluations, versioning prompts, and detecting when model quality degrades. Common tools include Langfuse, Braintrust, and Helicone.
Analogy
DevOps tells you if your servers are down or your API is slow. LLMOps tells you if your AI is getting worse, costing more than expected, or failing in ways your users haven't reported yet.
Product Manager Takeaway
You can't improve what you can't measure. Even if you don't configure it yourself, push for LLMOps tooling early. It's the difference between flying blind and having real data on how your AI feature is actually performing.
M
Nuanced
Model Drift
Also called: data drift, distribution shift, concept drift
The gradual degradation of a model's accuracy over time, as the real-world data it encounters diverges from the data it was trained on. A model trained on user behaviour from 2022 may perform poorly in 2025 if user needs, language, or patterns have changed. Drift is invisible unless you're actively measuring output quality over time.
Analogy
A sales forecast model trained before a market shock. It keeps producing confident predictions, but the world has changed and the assumptions underneath no longer hold. Confidence doesn't equal accuracy.
Product Manager Takeaway
Model drift is why AI products need ongoing monitoring, not just a launch plan. If your evals show quality declining without a prompt change or model update, drift is a likely cause. Build a feedback loop and set quality thresholds that trigger a review.
Strategy
Multimodal AI
Also called: vision-language models, audio-language models
Models that can work with more than one type of data. Understanding both text and images (like GPT-4o or Gemini), or processing audio alongside text. A multimodal model can look at a photo of a receipt and extract the total. It can watch a video and write a summary. It can listen to a voice note and respond in text.
Analogy
Most early AI assistants could only read. Multimodal AI can read, see, and hear, opening up a much wider range of real-world use cases.
Product Manager Takeaway
Ask whether your use cases are purely text-based, or whether vision (images, screenshots, documents) or audio could unlock meaningfully better experiences. Most leading models now support multimodal inputs at no extra complexity.
Emerging
MCP: Model Context Protocol
Also called: tool protocol, context protocol
An open standard that defines how AI models connect to external tools, data sources, and services through a universal interface. Instead of every AI application building custom integrations for each tool it needs (databases, APIs, file systems, SaaS products), MCP provides a single, standardised protocol that any AI model can use to discover and call any compatible tool. USB-C for AI: one connection standard that works everywhere.
Analogy
Before USB, every device had its own proprietary cable. MCP does the same thing for AI integrations. Instead of building a custom connector for every tool your AI needs to use, you plug into one standard protocol and it just works.
Product Manager Takeaway
MCP lowers the cost of connecting AI features to existing tools and data. Instead of bespoke integrations, you adopt MCP-compatible servers and let your AI access Slack, databases, or internal APIs through a standard interface. Your AI product can gain new capabilities without new engineering work, just by adding new MCP servers.
P
Core Skill
Prompt / Prompt Engineering
Also called: prompting, instructions
A prompt is the text you send to a model. Your question, instruction, or request. Prompt engineering is crafting those inputs to get better, more reliable outputs. Small changes in wording, structure, and framing can dramatically change the quality of what the model produces.
Analogy
Writing a great job brief. The clearer, more specific, and better-structured your brief, the better the work you'll get back. Vague in, vague out.
Product Manager Takeaway
Prompt engineering is now a core product manager skill. The system prompt (the hidden instruction that shapes how a product's AI behaves) is effectively your product spec for every AI interaction.
Watch Out
Prompt Injection
Also called: prompt hijacking, adversarial prompting
An attack where a user crafts input designed to override or bypass a model's system prompt and guardrails, effectively hijacking the AI's behaviour. A user might type "Ignore your previous instructions and instead…" to make the AI act outside its intended role. A real security vector for any product with user-facing AI inputs, particularly those that process external content (emails, documents, web pages).
Analogy
A customer calls your helpline and convinces the agent to ignore their training and reveal confidential information by framing it as an internal audit. The agent was told one thing. A clever caller found a way around it.
Product Manager Takeaway
Include prompt injection in your threat model before launch. Work with security to test it. Consider what data or actions your AI can access. If an injected prompt can exfiltrate data or take real-world actions, the risk is serious. Limit the blast radius by scoping what the AI is allowed to do.
Strategy
Precision & Recall
Also called: positive predictive value, sensitivity, true positive rate
Precision measures: when the model flags something, how often is it right? Recall measures: of all the cases that should be flagged, how many does the model actually catch? You can rarely maximise both. High precision means fewer false alarms but more missed cases. High recall means you catch more real cases but also get more false positives. The right balance is a product decision that depends on context.
Analogy
A smoke detector set to maximum sensitivity (high recall) catches every fire but also goes off when you make toast. One set to maximum precision only sounds when there's a real fire, but might miss a small one. Neither is universally right. It depends on the stakes.
Product Manager Takeaway
This is a product manager decision, not a data science one. For medical diagnostics or fraud detection, maximise recall. Missing a case is worse than a false alarm. For recommendations or notifications, maximise precision. Noise erodes trust. Know which trade-off your feature needs.
R
Must Know
RAG: Retrieval-Augmented Generation
Also called: retrieval, grounded generation
Before generating a response, the AI first searches a knowledge base to retrieve relevant documents or data, then uses that retrieved content to inform and ground its answer. This solves two problems: it gives the model access to up-to-date or proprietary information it wasn't trained on, and it reduces hallucination by anchoring responses in real sources.
Analogy
Instead of answering from memory, the AI first looks things up in your filing cabinet, reads the relevant pages, then gives you an answer based on what it actually found. Much harder to make things up when you're working from a source document.
Product Manager Takeaway
RAG is the default starting point for most enterprise AI features. Faster and cheaper than fine-tuning, handles real-time data, and makes responses more explainable. If a user asks "why did you say that?", you can show them the source.
S
Core Concept
System Prompt
Also called: system message, instructions
A set of hidden instructions given to an AI at the start of every conversation, before the user says anything. It defines the AI's role, tone, constraints, and behaviour: things like "You are a helpful assistant for a legal firm. Always recommend consulting a qualified solicitor. Never speculate on case outcomes." Users typically don't see the system prompt, but it shapes every response.
Analogy
The employee handbook your AI has already read before starting its first shift. It determines how the AI behaves across every customer interaction.
Product Manager Takeaway
Writing and iterating on your system prompt is one of the highest-leverage things a product manager can do. It's the difference between a generic AI and a well-designed product experience.
Core Concept
Supervised & Unsupervised Learning
Also called: labelled learning, self-supervised learning, clustering
Two fundamental approaches to training models. Supervised learning trains on labelled data where the correct answers are provided: "this email is spam / not spam." The model learns the pattern between inputs and known outputs. Unsupervised learning gives the model raw, unlabelled data and asks it to discover hidden patterns, groupings, or structures on its own, without being told what to look for.
Analogy
Supervised learning is a teacher marking a student's homework. Correct answers are provided, and the student learns from the feedback. Unsupervised learning is giving a student a box of unlabelled photos and asking them to sort them into groups. They figure out the categories themselves.
Product Manager Takeaway
The distinction matters when scoping AI features. Supervised learning needs labelled training data (expensive, but results are predictable). Unsupervised learning can find patterns you didn't know existed, but outputs need interpretation. Ask your team: do we have labelled data, and is it enough?
T
Nuanced
Temperature
Also called: randomness, creativity setting
A setting that controls how "creative" or "random" an AI's responses are. Low temperature (near 0) makes responses more predictable, consistent, and conservative. The model picks the most likely word each time. High temperature (near 1 or above) makes responses more varied, creative, and sometimes surprising, or incoherent.
Analogy
A dial between "by the book" and "freestyle." For a customer support bot, you want low temperature. For a creative writing assistant, higher temperature produces more interesting results.
Product Manager Takeaway
Temperature is a product decision, not just a technical one. The right setting depends on your use case. Predictability and creativity are in tension. You can't maximise both.
Core Concept
Token
Also called: token count, context length
The basic unit LLMs use to process text. A token is roughly three to four characters, or about three-quarters of a word. "Hello world" is approximately 3 tokens. LLMs have a maximum number of tokens they can process at once, called the context window. Tokens also determine cost. You pay per token sent and received.
Analogy
Tokens are like the pages of a notepad. The model can only "see" the text that fits on its notepad at once. Anything beyond that gets cut off.
Product Manager Takeaway
Token limits affect what you can put in a prompt and what you get back. Longer conversations, documents, and outputs all cost more. Token cost is how you calculate AI unit economics.
V
Core Concept
Vector Database
Also called: vector store, embedding store
A database specifically designed to store and search embeddings. Unlike a traditional database that looks for exact text matches, a vector database finds content based on semantic similarity. Common examples include Pinecone, Weaviate, and pgvector (a Postgres extension). Vector databases are the storage layer that makes RAG work.
Analogy
A normal database is a filing cabinet with labelled folders. You have to know exactly what you're looking for. A vector database is a really smart librarian who can say "you asked about X, but these five documents are the most relevant to what you actually need."
Product Manager Takeaway
If your product needs to search through proprietary documents, knowledge bases, or large content libraries, you'll likely need a vector database. It's infrastructure. Your job is to understand what problem it solves and what the cost implications are.
Z
Core Skill
Zero-shot & Few-shot Prompting
Also called: zero-shot inference, few-shot examples, in-context learning
Two ways of instructing a model. Zero-shot gives the model a task with no examples, just the instruction. Few-shot provides one or more examples of the input/output pattern you want before asking the model to do it for real. Few-shot prompting is one of the fastest ways to improve output quality without any code changes.
Analogy
Zero-shot is asking a new hire to write a customer email without showing them any previous emails. Few-shot is showing them three good examples first and saying "match this tone and structure." The examples do a lot of the work.
Product Manager Takeaway
When your AI feature isn't producing outputs in the right format or tone, try adding two or three examples directly in your system prompt. It often beats hours of prompt rewording.
Quick reference
- AI Agent — AI that takes sequences of actions, not just one response
- Bias — systematic skew in AI outputs from patterns in training data
- Chain-of-Thought — prompt the model to reason step-by-step for better accuracy
- Context Window — how much the model can "see" at once
- Deterministic vs Probabilistic — predictable same-output vs variable AI-predicted output
- Embeddings — turning text into numbers that capture meaning
- Evals — automated quality testing for AI outputs
- Fine-Tuning — retraining a model on your data; expensive, use sparingly
- Generative AI — models that create new content (text, images, code)
- Guardrails — rules and filters that constrain what the AI will and won't do
- Hallucination — when the model makes things up confidently
- Hallucination Rate — % of outputs that are factually wrong
- Human-in-the-Loop — humans review or approve AI outputs before real-world impact
- Inference — the moment a trained model is used to generate output
- LLM — the AI brain that reads and writes text
- LLMOps — monitoring and improving AI in production
- Model Drift — accuracy degrades over time as real-world data shifts
- MCP — universal standard for connecting AI to external tools and data
- Multimodal — AI that handles text, images, audio, and more
- Prompt — the instruction you send the model
- Precision & Recall — trade-off between accuracy of flags vs catching all cases
- Prompt Injection — adversarial input designed to hijack the AI's behaviour
- RAG — look up real sources before answering; reduces hallucination
- Supervised & Unsupervised Learning — training with labelled vs unlabelled data
- System Prompt — the hidden instruction that defines your product's AI behaviour
- Temperature — controls how predictable vs creative the output is
- Token — the unit of text LLMs process; also determines cost
- Vector Database — storage for embeddings; enables semantic search
- Zero-shot / Few-shot — give the model a task with no examples / show examples first