Definition

LLM — IT definition

Large Language Model: an AI model trained on massive text corpora that can understand and generate natural language.

An LLM (Large Language Model) is an AI model trained on massive text corpora to understand and generate natural language. It is the foundational layer of the GenAI wave triggered by ChatGPT in November 2022, and the underlying engine of most enterprise AI agents.

Modern LLMs — GPT-4, Claude 4, Gemini 2, Llama 3, Mistral Large — count hundreds of billions to trillions of parameters and train on hundreds of billions of tokens (words or word pieces). Their capabilities emerge from scaling laws: more data + more parameters + more compute = qualitatively new capabilities (reasoning, code, translation).

How an LLM works

An LLM uses the Transformer architecture (introduced by Google in 2017 in Attention Is All You Need). At a high level:

•Tokenization: input text is split into tokens (~3-4 characters each).
•Embeddings: each token becomes a numerical vector.
•Attention: the core mechanism that lets the model weigh each token against the others.
•Layers: stacks of Transformer blocks (often 80-120 in large models).
•Prediction: at each step the model predicts the next-token distribution.
•Sampling: a token is drawn from that distribution (temperature parameter).

This next-token prediction, at the scale of hundreds of billions of parameters, gives rise to emergent reasoning, translation, code, and synthesis capabilities.

LLM families

•Proprietary: GPT-4/GPT-5 (OpenAI), Claude (Anthropic), Gemini (Google).
•Open-weight: Llama (Meta), Mistral, DeepSeek, Qwen. Downloadable and deployable locally.
•Reasoning models: o1, o3 (OpenAI), Claude Sonnet/Opus with extended thinking, Gemini thinking. Optimized for complex tasks via chain-of-thought.
•Multimodal: ingest text + image + audio + video (GPT-4o, Gemini 2 Flash, Claude 4).
•Specialized: medicine (Med-PaLM), code (Codex, Claude Code), legal.

LLM lifecycle

•Pre-training: on a massive corpus, weeks of compute on thousands of GPUs. Cost: $50M to $1B for the largest models.
•Fine-tuning: adaptation to a domain or response format.
•RLHF / RLAIF: Reinforcement Learning from Human/AI Feedback to align outputs with human preferences.
•Inference: production use, billed per token by vendors.

Costs and limits

•Inference cost: per-token, from a few cents to dozens of dollars per million tokens depending on the model.
•Latency: hundreds of milliseconds to several seconds per response.
•Context limit: 100k to 2M tokens depending on the model. Beyond: use RAG.
•Knowledge cutoff: the model ignores anything after its training date.
•[Hallucinations](/en/glossary/hallucination-ia): confidently presented false information.

Enterprise usage patterns

•Chat and copilot: conversational assistant for employees.
•[RAG](/en/glossary/rag): grounding LLMs on internal documents to reduce hallucinations.
•[AI agents](/en/glossary/agent-ia): LLM + tools + execution loop to automate tasks.
•Code generation: Copilot, Cursor, Claude Code for engineering teams.
•Extraction and structuring: parse documents, extract entities, classify.
•Summary and synthesis: condense documents, meetings, conversations.

Local LLM vs cloud LLM

Three deployment options:

•Public API: (OpenAI, Anthropic, Google): simple, performant, but data leaves the company.
•Private cloud: models hosted at a hyperscaler (AWS Bedrock, Azure OpenAI, Vertex AI) or sovereign cloud.
•Local / on-premise: open-weight models (Llama, Mistral) deployed on internal infrastructure. Maximal sovereignty, lower peak performance than the best closed models.

The choice depends on data sensitivity, regulatory requirements (GDPR, professional secrecy, defense), and budget.

Governing LLMs in the enterprise

Without a frame, LLM usage slips into Shadow AI. Best practices:

•Enterprise license: with DPA (no prompt reuse for training).
•[SSO](/en/glossary/sso): and conversation logging.
•Documented usage policy: and training.
•Catalogue of approved LLMs: and access management.
•[ISO 42001](/en/glossary/iso-42001): and AI Act compliance.

Kabeen automatically detects LLMs and GenAI services used across the IT estate, giving the CIO immediate visibility on the real governance perimeter.

Frequently asked questions

What is an LLM?

An LLM (Large Language Model) is an AI model trained on massive text corpora to understand and generate natural language. It is the foundational layer of the GenAI wave triggered by ChatGPT in November 2022. Modern LLMs (GPT-4, Claude, Gemini, Llama, Mistral) count hundreds of billions to trillions of parameters.

How does an LLM work?

An LLM uses the Transformer architecture and predicts, at each step, the most probable next token given the context. Text is tokenized, converted to vectors (embeddings), then processed through stacks of attention layers. At the scale of hundreds of billions of parameters, this simple next-token prediction yields emergent reasoning, translation, code, and synthesis capabilities.

What is the difference between an LLM, GenAI, and an AI agent?

GenAI is the general family of generative models (text, image, audio, video). An LLM is a specific kind of GenAI focused on text. An AI agent is a software system that uses an LLM as its reasoning engine, coupling it with tools and an execution loop to automate tasks. The three nest: AI agent ⊃ LLM ⊂ GenAI.

Should you deploy LLMs locally or in the cloud?

Three options: public API (OpenAI, Anthropic, Google) — simple and performant but data leaves the company; private cloud (Azure OpenAI, Bedrock, Vertex AI, sovereign cloud) — a good isolation/performance balance; local on-premise of open-weight models (Llama, Mistral, DeepSeek) — maximal sovereignty but lower peak performance. The choice depends on data sensitivity, applicable regulation (GDPR, professional secrecy), and budget.

Need help mapping your IT landscape?

Kabeen helps you inventory, analyze and optimize your application portfolio.

Try for free