AI Jargon Decoded — Philiy Lander

AI conversations move fast, and the jargon moves faster. If you've ever nodded along in a meeting while privately Googling "what is a vector database," this is for you.

These are the terms that come up in AI product work, explained in plain English, with the context that actually matters when you're the one making the call.

How to use this guide

Terms run A-Z. Use the letter nav to jump to what you need.
Each entry has a plain definition, a real-world analogy, and a product manager takeaway.
No maths. No code. Just enough to be dangerous in the right way.

A

Emerging

AI Agent

Also called: agentic AI, autonomous agent

An AI system that doesn't just respond to a single prompt. It takes a sequence of actions to accomplish a goal. Agents can use tools (web search, code execution, sending emails), plan multi-step workflows, and make decisions along the way. Instead of one question and one answer, an agent might take ten steps to complete a task you described at a high level.

Analogy The difference between asking a colleague "what's our Q3 revenue?" (one question, one answer) and "prepare the Q3 board pack" (pull data, build slides, check with finance, send for review). An agent handles the second kind of request.

Product Manager Takeaway Agentic products need different UX thinking. When does the agent ask for permission? What happens when it fails mid-task? How do users understand and trust what the agent did? Product problems as much as technical ones.

B

Watch Out

Bias (AI Bias)

Also called: algorithmic bias, model bias, training bias

Systematic skew in a model's outputs, caused by patterns in its training data, the way it was built, or the way it's deployed. Models learn from human-generated data, which contains human biases. The model can inherit and amplify those biases at scale. It shows up in product work as personas that default to demographic assumptions, recommendation systems that reinforce existing patterns, hiring tools that favour certain groups.

Analogy If you only ever interviewed customers at your London office, your "user insights" would skew towards a specific demographic. AI bias works the same way. The model's worldview is limited to the data it was trained on.

Product Manager Takeaway Bias isn't a bug you fix once. It's a quality dimension you monitor. Actively ask: whose perspective is over-represented? Whose is missing? Test AI features across demographics, geographies, and usage patterns before launch.

C

Nuanced

Chain-of-Thought (CoT)

Also called: step-by-step reasoning, CoT prompting

A prompting technique where you instruct the model to reason through a problem step by step before giving a final answer, rather than jumping straight to a conclusion. Adding "think step by step" or showing a worked example of reasoning in your prompt consistently improves accuracy on complex tasks, because it forces the model to slow down and check its own logic.

Analogy Like asking someone to show their working in a maths test. It doesn't just help the marker. It helps the student catch their own mistakes. Writing out the steps reduces errors.

Product Manager Takeaway For any AI feature where accuracy really matters (analysis, recommendations, decisions), consider prompting for CoT. It adds latency and tokens, but reduces hallucinations on complex tasks. A real quality-speed tradeoff to make consciously.

Core Concept

Context Window

Also called: context length, max tokens

The maximum amount of text an LLM can "see" and consider at one time, including both the input you send (prompt, conversation history, documents) and the output it generates. Modern models range from 8,000 tokens (about 6,000 words) to over a million tokens (roughly a small novel).

Analogy Like a whiteboard in a meeting room. The model can only work with what's on the whiteboard right now. Once it's full, something has to be erased to add something new.

Product Manager Takeaway Context window size determines what use cases are viable. Summarising a 100-page document needs a large window. Short chat replies don't. This affects which model you choose, and what you pay.

D

Core Concept

Deterministic vs Probabilistic

Also called: rule-based vs statistical, predictable vs variable

The fundamental distinction between traditional software and AI. Deterministic systems always produce the same output for the same input. A "sort by date" button always sorts by date. Probabilistic systems predict the most likely useful output based on patterns, and that output can vary between runs. AI is probabilistic. It can be brilliant, or it can be confidently wrong, and often you can't tell in advance which.

Analogy A calculator is deterministic. 2+2 always equals 4. A weather forecast is probabilistic. It predicts the most likely outcome based on patterns, but tomorrow might surprise you. AI works more like the forecast.

Product Manager Takeaway This shift changes how you write specs, define success metrics, and think about QA. AI features can't be tested with simple pass/fail. They need evaluation frameworks, confidence thresholds, feedback loops. If you understand one concept from AI, make it this one.

E

Core Concept

Embeddings

Also called: vector representations, semantic vectors

A way of representing text (or images, audio, etc.) as a list of numbers that captures meaning. Two pieces of text with similar meaning will have similar numbers, so the system can find related content even if the exact words don't match. Embeddings are what power semantic search, searching by meaning rather than keyword.

Analogy Imagine plotting every document in your knowledge base on a map. Similar topics cluster together. Embeddings create that map. When a user asks a question, you find the nearest cluster and retrieve those documents.

Product Manager Takeaway Embeddings are the technology behind "find similar content," semantic search, and RAG. You don't need the maths. Just know it's what makes meaning-based search possible.

Advanced

Evals: Evaluation Framework

Also called: LLM evaluation, model evals

A systematic way of testing the quality of a model's outputs. A test suite for AI behaviour. Evals measure things like: is the answer factually correct? Is it grounded in the source document? Is it harmful? Does it complete the task? A good eval framework runs automatically whenever you change a prompt or upgrade a model, catching regressions before they reach users.

Analogy QA testing for software, but for judgement. Instead of checking if the button works, you're checking if the AI said something sensible, accurate, and safe, at scale, automatically.

Product Manager Takeaway Owning evals is becoming a core product manager responsibility. If you can't measure output quality automatically, you can't ship AI features with confidence. Start by defining what "good" looks like, then work with your team to automate that check.

F

Strategic Decision

Fine-Tuning

Also called: model fine-tuning, task-specific training

Taking a pre-trained LLM and training it further on your own data, to make it better at your particular task, domain, or style. Unlike RAG (which retrieves information at query time), fine-tuning bakes knowledge and behaviour into the model itself. It's more expensive, slower to iterate, and needs labelled training data, but can produce a highly specialised model.

Analogy RAG is giving an employee a reference manual to consult on the job. Fine-tuning is putting them through a six-month specialist training programme. It changes how they think, not just what they look up.

Product Manager Takeaway Default to RAG first. Only invest in fine-tuning once you have strong task clarity, sufficient labelled data (thousands of examples), and evidence that RAG isn't getting you close enough. Fine-tuning costs roughly 6x more to run than prompting a base model.

G

Foundation

Generative AI

Also called: GenAI, generative models

Models that create new content (text, images, code, audio, video) rather than classifying, predicting, or analysing existing data. ChatGPT, Claude, Midjourney, and GitHub Copilot are all generative AI. The distinction is creative output. A spam filter classifies emails (not generative). A tool that writes email replies creates new content (generative).

Analogy Traditional AI is a librarian who finds and organises existing books. Generative AI is an author who writes new ones. Both are useful, but the second raises different questions about accuracy, originality, and trust.

Product Manager Takeaway Most AI features product managers are building today are generative: drafting content, summarising documents, creating recommendations. Generative models are optimised to produce plausible outputs, not verified ones. That distinction drives most of your product design decisions around AI quality.

Core Concept

Guardrails

Also called: safety filters, content filters, output constraints

Rules, filters, and constraints built around a model to control what it will and won't do. Preventing harmful outputs, keeping responses on-topic, enforcing brand and compliance requirements. Guardrails can be implemented at the prompt level (instructions in the system prompt), at the API level (moderation layers), or post-generation (filtering outputs before they reach users).

Analogy Like bumpers in a bowling lane. The model can still move freely within the lane, but the guardrails stop it going somewhere it shouldn't. They don't make the AI smarter. They make it safer to deploy.

Product Manager Takeaway Guardrails are a product decision, not just a technical one. What topics should your AI refuse? What tone is off-limits? What happens when a guardrail fires: does the user get a helpful error or a confusing non-answer? UX and risk questions that product managers own.

H

Watch Out

Hallucination

Also called: confabulation, model error

When an AI generates a confident-sounding response that is factually wrong. Inventing statistics, citing non-existent papers, stating false information as if it were true. Happens because LLMs are trained to produce plausible-sounding text, not to verify facts. They don't "know" things the way humans do. They predict probable word sequences.

Analogy Asking a brilliant but overconfident colleague to fill in a report. They'll give you something that looks great, but they may have made up the bits they didn't know, and they won't flag which is which.

Product Manager Takeaway Hallucination is the primary quality risk for AI products that present information as fact. Your eval strategy, your RAG setup, and your UI design should all account for it.

Key Metric

Hallucination Rate

Also called: factual error rate, grounding score

The percentage of AI responses that contain factually incorrect or fabricated information. Measuring it requires either human review of a sample of outputs, or an automated "LLM-as-a-judge" approach where another model checks outputs against ground truth. A hallucination rate of even 1-2% can be unacceptable in high-stakes contexts like medical or legal products.

Analogy If your customer support agent gets the answer wrong 5% of the time and users can't tell which 5%, you have a trust problem, not just a quality problem.

Product Manager Takeaway Define your acceptable hallucination rate before you launch. In low-stakes creative tools, some error is tolerable. In products where users rely on accuracy, you need grounding (RAG), citation, and human review workflows.

Must Know

Human-in-the-Loop (HITL)

Also called: human review, human oversight, hybrid AI

A design pattern where a human reviews, corrects, or approves AI outputs before they have real-world impact. HITL sits on a spectrum. At one end, every AI output goes to a human before action. At the other, humans only review flagged or low-confidence outputs. The right level depends on the stakes. A typo suggestion needs no review. A credit decision almost certainly does.

Analogy Tracked changes in a document. The AI drafts the content. A human decides what to accept, reject, or amend before it goes out. AI speed, human judgement.

Product Manager Takeaway HITL is often the right answer for high-stakes AI features, especially early in a product's life. It builds trust, generates labelled data for improvement, and gives you a safety net while you measure AI accuracy. Design the review workflow as carefully as the AI itself.

I

Core Concept

Inference

Also called: prediction, model inference, serving

The moment a trained model is actually put to work, processing new input and generating an output. Training is the learning phase (expensive, done once or periodically). Inference is the doing phase (cheaper per call, happens every time a user interacts with the model). When someone asks ChatGPT a question, that's inference. Inference cost, latency, and throughput are key operational metrics for any AI product.

Analogy Training is a student studying for an exam over months. Inference is the moment they sit down and answer the questions. All the learning happened beforehand. Inference is where it gets applied.

Product Manager Takeaway Inference is what you pay for at scale. Every API call to an LLM is an inference request. Understanding inference cost per query is essential for AI unit economics. Faster, cheaper inference (smaller models, caching, batching) directly impacts your margins.

L

Foundation

LLM: Large Language Model

Also called: foundation model, base model

An AI system trained on enormous amounts of text to understand and generate human language. LLMs like GPT-4, Claude, and Gemini can write, summarise, translate, answer questions, and reason through problems, all by predicting what words should come next based on patterns learned from billions of documents.

Analogy A very well-read assistant who has absorbed almost everything ever written, and can hold a conversation on virtually any topic. But learned entirely from text, with no lived experience.

Product Manager Takeaway The LLM is the engine under the hood of most AI features. When you're choosing between OpenAI, Anthropic, or Google, you're choosing which LLM to build on.

Infrastructure

LLMOps

Also called: AI observability, ML operations for LLMs

The practice and tooling for deploying, monitoring, and improving LLM-based applications in production. LLMOps covers tracking every AI request and response (tracing), monitoring cost and latency, running evaluations, versioning prompts, and detecting when model quality degrades. Common tools include Langfuse, Braintrust, and Helicone.

Analogy DevOps tells you if your servers are down or your API is slow. LLMOps tells you if your AI is getting worse, costing more than expected, or failing in ways your users haven't reported yet.

Product Manager Takeaway You can't improve what you can't measure. Even if you don't configure it yourself, push for LLMOps tooling early. It's the difference between flying blind and having real data on how your AI feature is actually performing.

M

Nuanced

Model Drift

Also called: data drift, distribution shift, concept drift

The gradual degradation of a model's accuracy over time, as the real-world data it encounters diverges from the data it was trained on. A model trained on user behaviour from 2022 may perform poorly in 2025 if user needs, language, or patterns have changed. Drift is invisible unless you're actively measuring output quality over time.

Analogy A sales forecast model trained before a market shock. It keeps producing confident predictions, but the world has changed and the assumptions underneath no longer hold. Confidence doesn't equal accuracy.

Product Manager Takeaway Model drift is why AI products need ongoing monitoring, not just a launch plan. If your evals show quality declining without a prompt change or model update, drift is a likely cause. Build a feedback loop and set quality thresholds that trigger a review.

Strategy

Multimodal AI

Also called: vision-language models, audio-language models

Models that can work with more than one type of data. Understanding both text and images (like GPT-4o or Gemini), or processing audio alongside text. A multimodal model can look at a photo of a receipt and extract the total. It can watch a video and write a summary. It can listen to a voice note and respond in text.

Analogy Most early AI assistants could only read. Multimodal AI can read, see, and hear, opening up a much wider range of real-world use cases.

Product Manager Takeaway Ask whether your use cases are purely text-based, or whether vision (images, screenshots, documents) or audio could unlock meaningfully better experiences. Most leading models now support multimodal inputs at no extra complexity.

Emerging

MCP: Model Context Protocol

Also called: tool protocol, context protocol

An open standard that defines how AI models connect to external tools, data sources, and services through a universal interface. Instead of every AI application building custom integrations for each tool it needs (databases, APIs, file systems, SaaS products), MCP provides a single, standardised protocol that any AI model can use to discover and call any compatible tool. USB-C for AI: one connection standard that works everywhere.

Analogy Before USB, every device had its own proprietary cable. MCP does the same thing for AI integrations. Instead of building a custom connector for every tool your AI needs to use, you plug into one standard protocol and it just works.

Product Manager Takeaway MCP lowers the cost of connecting AI features to existing tools and data. Instead of bespoke integrations, you adopt MCP-compatible servers and let your AI access Slack, databases, or internal APIs through a standard interface. Your AI product can gain new capabilities without new engineering work, just by adding new MCP servers.

P

Core Skill

Prompt / Prompt Engineering

Also called: prompting, instructions

A prompt is the text you send to a model. Your question, instruction, or request. Prompt engineering is crafting those inputs to get better, more reliable outputs. Small changes in wording, structure, and framing can dramatically change the quality of what the model produces.

Analogy Writing a great job brief. The clearer, more specific, and better-structured your brief, the better the work you'll get back. Vague in, vague out.

Product Manager Takeaway Prompt engineering is now a core product manager skill. The system prompt (the hidden instruction that shapes how a product's AI behaves) is effectively your product spec for every AI interaction.

Watch Out

Prompt Injection

Also called: prompt hijacking, adversarial prompting

An attack where a user crafts input designed to override or bypass a model's system prompt and guardrails, effectively hijacking the AI's behaviour. A user might type "Ignore your previous instructions and instead…" to make the AI act outside its intended role. A real security vector for any product with user-facing AI inputs, particularly those that process external content (emails, documents, web pages).

Analogy A customer calls your helpline and convinces the agent to ignore their training and reveal confidential information by framing it as an internal audit. The agent was told one thing. A clever caller found a way around it.

Product Manager Takeaway Include prompt injection in your threat model before launch. Work with security to test it. Consider what data or actions your AI can access. If an injected prompt can exfiltrate data or take real-world actions, the risk is serious. Limit the blast radius by scoping what the AI is allowed to do.

Strategy

Precision & Recall

Also called: positive predictive value, sensitivity, true positive rate

Precision measures: when the model flags something, how often is it right? Recall measures: of all the cases that should be flagged, how many does the model actually catch? You can rarely maximise both. High precision means fewer false alarms but more missed cases. High recall means you catch more real cases but also get more false positives. The right balance is a product decision that depends on context.

Analogy A smoke detector set to maximum sensitivity (high recall) catches every fire but also goes off when you make toast. One set to maximum precision only sounds when there's a real fire, but might miss a small one. Neither is universally right. It depends on the stakes.

Product Manager Takeaway This is a product manager decision, not a data science one. For medical diagnostics or fraud detection, maximise recall. Missing a case is worse than a false alarm. For recommendations or notifications, maximise precision. Noise erodes trust. Know which trade-off your feature needs.

R

Must Know

RAG: Retrieval-Augmented Generation

Also called: retrieval, grounded generation

Before generating a response, the AI first searches a knowledge base to retrieve relevant documents or data, then uses that retrieved content to inform and ground its answer. This solves two problems: it gives the model access to up-to-date or proprietary information it wasn't trained on, and it reduces hallucination by anchoring responses in real sources.

Analogy Instead of answering from memory, the AI first looks things up in your filing cabinet, reads the relevant pages, then gives you an answer based on what it actually found. Much harder to make things up when you're working from a source document.

Product Manager Takeaway RAG is the default starting point for most enterprise AI features. Faster and cheaper than fine-tuning, handles real-time data, and makes responses more explainable. If a user asks "why did you say that?", you can show them the source.

S

Core Concept

System Prompt

Also called: system message, instructions

A set of hidden instructions given to an AI at the start of every conversation, before the user says anything. It defines the AI's role, tone, constraints, and behaviour: things like "You are a helpful assistant for a legal firm. Always recommend consulting a qualified solicitor. Never speculate on case outcomes." Users typically don't see the system prompt, but it shapes every response.

Analogy The employee handbook your AI has already read before starting its first shift. It determines how the AI behaves across every customer interaction.

Product Manager Takeaway Writing and iterating on your system prompt is one of the highest-leverage things a product manager can do. It's the difference between a generic AI and a well-designed product experience.

Core Concept

Supervised & Unsupervised Learning

Also called: labelled learning, self-supervised learning, clustering

Two fundamental approaches to training models. Supervised learning trains on labelled data where the correct answers are provided: "this email is spam / not spam." The model learns the pattern between inputs and known outputs. Unsupervised learning gives the model raw, unlabelled data and asks it to discover hidden patterns, groupings, or structures on its own, without being told what to look for.

Analogy Supervised learning is a teacher marking a student's homework. Correct answers are provided, and the student learns from the feedback. Unsupervised learning is giving a student a box of unlabelled photos and asking them to sort them into groups. They figure out the categories themselves.

Product Manager Takeaway The distinction matters when scoping AI features. Supervised learning needs labelled training data (expensive, but results are predictable). Unsupervised learning can find patterns you didn't know existed, but outputs need interpretation. Ask your team: do we have labelled data, and is it enough?

T

Nuanced

Temperature

Also called: randomness, creativity setting

A setting that controls how "creative" or "random" an AI's responses are. Low temperature (near 0) makes responses more predictable, consistent, and conservative. The model picks the most likely word each time. High temperature (near 1 or above) makes responses more varied, creative, and sometimes surprising, or incoherent.

Analogy A dial between "by the book" and "freestyle." For a customer support bot, you want low temperature. For a creative writing assistant, higher temperature produces more interesting results.

Product Manager Takeaway Temperature is a product decision, not just a technical one. The right setting depends on your use case. Predictability and creativity are in tension. You can't maximise both.

Core Concept

Token

Also called: token count, context length

The basic unit LLMs use to process text. A token is roughly three to four characters, or about three-quarters of a word. "Hello world" is approximately 3 tokens. LLMs have a maximum number of tokens they can process at once, called the context window. Tokens also determine cost. You pay per token sent and received.

Analogy Tokens are like the pages of a notepad. The model can only "see" the text that fits on its notepad at once. Anything beyond that gets cut off.

Product Manager Takeaway Token limits affect what you can put in a prompt and what you get back. Longer conversations, documents, and outputs all cost more. Token cost is how you calculate AI unit economics.

V

Core Concept

Vector Database

Also called: vector store, embedding store

A database specifically designed to store and search embeddings. Unlike a traditional database that looks for exact text matches, a vector database finds content based on semantic similarity. Common examples include Pinecone, Weaviate, and pgvector (a Postgres extension). Vector databases are the storage layer that makes RAG work.

Analogy A normal database is a filing cabinet with labelled folders. You have to know exactly what you're looking for. A vector database is a really smart librarian who can say "you asked about X, but these five documents are the most relevant to what you actually need."

Product Manager Takeaway If your product needs to search through proprietary documents, knowledge bases, or large content libraries, you'll likely need a vector database. It's infrastructure. Your job is to understand what problem it solves and what the cost implications are.

Z

Core Skill

Zero-shot & Few-shot Prompting

Also called: zero-shot inference, few-shot examples, in-context learning

Two ways of instructing a model. Zero-shot gives the model a task with no examples, just the instruction. Few-shot provides one or more examples of the input/output pattern you want before asking the model to do it for real. Few-shot prompting is one of the fastest ways to improve output quality without any code changes.

Analogy Zero-shot is asking a new hire to write a customer email without showing them any previous emails. Few-shot is showing them three good examples first and saying "match this tone and structure." The examples do a lot of the work.

Product Manager Takeaway When your AI feature isn't producing outputs in the right format or tone, try adding two or three examples directly in your system prompt. It often beats hours of prompt rewording.

TL;DR

One-Line Cheat Sheet

Quick reference

AI Agent — AI that takes sequences of actions, not just one response
Bias — systematic skew in AI outputs from patterns in training data
Chain-of-Thought — prompt the model to reason step-by-step for better accuracy
Context Window — how much the model can "see" at once
Deterministic vs Probabilistic — predictable same-output vs variable AI-predicted output
Embeddings — turning text into numbers that capture meaning
Evals — automated quality testing for AI outputs
Fine-Tuning — retraining a model on your data; expensive, use sparingly
Generative AI — models that create new content (text, images, code)
Guardrails — rules and filters that constrain what the AI will and won't do
Hallucination — when the model makes things up confidently
Hallucination Rate — % of outputs that are factually wrong
Human-in-the-Loop — humans review or approve AI outputs before real-world impact
Inference — the moment a trained model is used to generate output
LLM — the AI brain that reads and writes text
LLMOps — monitoring and improving AI in production
Model Drift — accuracy degrades over time as real-world data shifts
MCP — universal standard for connecting AI to external tools and data
Multimodal — AI that handles text, images, audio, and more
Prompt — the instruction you send the model
Precision & Recall — trade-off between accuracy of flags vs catching all cases
Prompt Injection — adversarial input designed to hijack the AI's behaviour
RAG — look up real sources before answering; reduces hallucination
Supervised & Unsupervised Learning — training with labelled vs unlabelled data
System Prompt — the hidden instruction that defines your product's AI behaviour
Temperature — controls how predictable vs creative the output is
Token — the unit of text LLMs process; also determines cost
Vector Database — storage for embeddings; enables semantic search
Zero-shot / Few-shot — give the model a task with no examples / show examples first

A

B

C

D

E

F

G

H

I

L

M

P

R

S

T

V

Z

One-Line Cheat Sheet

I work directly with senior leaders on capability and organisational change.