LLM Models Directory

Complete version history of every major large language model — past, present, and frontier

12 Companies 100+ Models Updated Mar 2026
O

OpenAI — GPT & o-Series

2026
GPT-5.4 Feb 2026

Latest flagship. Supersedes GPT-5.2. Enhanced reasoning, coding, and multi-step planning.

Closed 400K context Reasoning

New default in ChatGPT, replacing GPT-4o, o3, o4-mini, GPT-4.1, and GPT-4.5. Pro variant uses scaled parallel test-time compute.

Closed 400K context Reasoning
2025
GPT-5.2 Dec 2025

First GPT-5 series release. 400K context, 100% on AIME 2025, 6.2% hallucination rate (~40% reduction).

Closed 400K context
GPT-oss-120B / GPT-oss-20B 2025

OpenAI's first open-weight models. 120B and 20B parameter variants.

Open Weight 120B / 20B

Major advances in coding, instruction following, and long-context understanding. Supports up to 1M tokens. Retired from ChatGPT Feb 2026.

Closed 1M context Coding
o4-mini Apr 2025

Small reasoning model optimized for science, math, and coding. Part of the o-series reasoning family.

Closed Reasoning
o3 / o3-mini Jan–Feb 2025

Next-gen reasoning models. o3-mini optimized for STEM tasks. o3 full model for complex multi-step reasoning.

Closed Reasoning

General-purpose LLM with improved EQ and reduced hallucinations. Released as research preview.

Closed
2024
o1 / o1-mini Sep 2024

First reasoning models trained with RL for complex multi-step thinking. Preview release.

Closed Reasoning

"Omni" model with native multimodal capabilities (text, vision, audio). 128K context. GPT-4o mini is a smaller, faster variant.

Closed 128K context Multimodal
2023
GPT-4 / GPT-4 Turbo Mar – Nov 2023

Flagship model with vision capabilities. GPT-4 Turbo expanded context to 128K and reduced cost.

Closed 128K context Vision
GPT-3.5 Turbo 2023 (updated)

Cost-effective model that powered the original ChatGPT. 16K context variant released mid-2023.

Closed 16K context
A

Anthropic — Claude

2026
Claude Opus 4.6 Feb 5, 2026

Most capable Claude model. 1M context window, Agent Teams for native multi-agent collaboration. API: claude-opus-4-6

Closed 1M context Reasoning
Claude Sonnet 4.6 Feb 17, 2026

Best balance of speed and quality. Same pricing as Sonnet 4.5. API: claude-sonnet-4-6

Closed Coding Reasoning
2025
Claude Opus 4.5 Nov 24, 2025

Major upgrade to Opus tier with improved agentic capabilities and long-form reasoning.

Closed Reasoning
Claude Haiku 4.5 Oct 15, 2025

Fastest and most cost-effective Claude model. API: claude-haiku-4-5-20251001

Closed

Claude 4 generation. Opus 4 excels at long-running tasks and agentic workflows. Sonnet 4 improved coding, reasoning, and instruction-following.

Closed 200K context Reasoning

Introduced extended thinking — hybrid reasoning that lets Claude pause and think step-by-step. Major quality jump on complex math, science, and code.

Closed 200K context Extended Thinking
2024
Claude 3.5 Sonnet (v2) / Claude 3.5 Haiku Oct 22, 2024

Upgraded Sonnet with computer use capability. Haiku 3.5 introduced as fast, affordable tier.

Closed 200K context Computer Use
Claude 3.5 Sonnet Jun 20, 2024

Outperformed the larger Claude 3 Opus on benchmarks. Set a new standard for the Sonnet tier.

Closed 200K context
Claude 3 Opus / Sonnet / Haiku Mar 4, 2024

Introduced the three-tier model structure. Opus (most capable), Sonnet (balanced), Haiku (fastest).

Closed 200K context Vision
G

Google DeepMind — Gemini

2026
Gemini 3.1 Pro Feb 2026

Most advanced Pro-tier model. 1M context, 77.1% on ARC-AGI-2. Deep Think reasoning. Multimodal (text, image, audio, video, code).

Closed 1M context Multimodal Deep Think

Lightweight, fast model for high-throughput tasks. Preview release.

Closed
2025

Gemini 3 generation. Pro replaces Ultra tier. Powerful agentic and coding capabilities.

Closed 1M context Multimodal

Improved reasoning and coding. Flash variants optimized for speed and cost.

Closed 1M context
Gemini 2.0 Flash / Pro Jan–Feb 2025

2.0 Flash became default model. Pro variant released Feb 2025.

Closed Multimodal
2024
Gemini 1.5 Pro / Flash Feb 2024

MoE architecture. First to offer 1M token context. Flash variant for speed-sensitive tasks.

Closed 1M context MoE
2023
Gemini 1.0 Ultra / Pro / Nano Dec 2023

Original Gemini family. Ultra (complex tasks), Pro (general), Nano (on-device).

Closed Multimodal
M

Meta AI — Llama

2025

First MoE architecture in Llama family. Scout: 109B total / 17B active, 10M context. Maverick: 400B total / 17B active, optimized for quality. Multimodal (text + image + video).

Open Weight 109B–400B MoE 10M context Multimodal
2024
Llama 3.2 Oct 2024

First multimodal Llama. Text-only: 1B, 3B. Vision-enabled: 11B, 90B.

Open Weight 1B–90B Vision
Llama 3.1 Jul 2024

8B, 70B, and 405B parameters. 128K context. Multilingual. Strong tool use and reasoning.

Open Weight 8B / 70B / 405B 128K context
Llama 3 Apr 2024

8B and 70B parameters. Significant quality improvements over Llama 2.

Open Weight 8B / 70B
2023
Llama 2 Jul 2023

7B, 13B, 70B. Commercial use license. Partnership with Microsoft.

Open Weight 7B / 13B / 70B
Llama 1 Feb 2023

7B to 65B parameters. Research-only license. Sparked the open-source LLM movement.

Open Weight 7B–65B
Mi

Mistral AI — Mistral & Mixtral

2025
Mistral Large 3 Dec 2025

Sparse MoE. 675B total / 41B active parameters. Frontier-level performance.

Open Weight 675B (41B active) MoE

Three small dense models for edge and cost-sensitive deployments.

Open Weight 3B / 7B / 14B

Specialized code models for software development workflows.

Open Weight Coding
Magistral Small / Medium Jun 2025

First Mistral reasoning models with chain-of-thought capabilities. Small is open-source.

Open Weight (Small) Reasoning
Mistral Medium 3 May 2025

Mid-tier model balancing quality and cost.

Closed
Mistral Small 3.1 Mar 2025

Efficient small model for edge and embedded use cases.

Open Weight
2024
Mistral Large 2 / Pixtral Large Jul–Nov 2024

Dense 123B parameter model. 128K context, 80+ languages. Pixtral Large adds multimodal (124B).

Open Weight 123B–124B 128K context Multimodal (Pixtral)
2023
Mixtral 8x7B Dec 2023

Sparse MoE: 8 expert networks of 7B each (~47B total, ~13B active). Outperformed Llama 2 70B.

Open Weight 47B (13B active) MoE
Mistral 7B Sep 2023

First Mistral release. 7B dense model that punched far above its weight class.

Open Weight 7B
D

DeepSeek — V-Series & R-Series

2025

V3.2-Exp uses Sparse Attention. Speciale variant surpasses GPT-5 on AIME/HMMT benchmarks.

Open Weight 671B (37B active) MoE Reasoning
DeepSeek-V3.1 Aug 2025

Hybrid model combining V3 + R1 strengths. 671B (37B active). 128K context. Switchable thinking/non-thinking modes.

Open Weight 671B (37B active) MoE 128K context
DeepSeek-R1-0528 May 2025

Major R1 upgrade pushing reasoning and inference capabilities further. Built on V3 Base.

Open Weight (MIT) Reasoning
DeepSeek-V3-0324 Mar 2025

Improved post-training drawing lessons from R1. Better reasoning, coding, and tool use.

Open Weight MoE
DeepSeek-R1 Jan 2025

The "DeepSeek moment." ChatGPT-level reasoning at fraction of training cost. MIT License. Includes distilled variants (1.5B–70B).

Open Weight (MIT) 671B (37B active) MoE Reasoning
2024
DeepSeek-V3 Dec 2024

671B total / 37B active MoE. Trained on 14.8T tokens. Competitive with GPT-4o at fraction of cost.

Open Weight 671B (37B active) MoE
DeepSeek-V2 / V2.5 / Coder V2 May–Sep 2024

V2 introduced Multi-head Latent Attention (MLA). V2.5 merged chat and coder capabilities. Coder V2 specialized for code.

Open Weight MoE Coding (Coder)
X

xAI — Grok

Grok 4.1 Nov 2025

#1 on LMArena Elo (1483) and EQ-Bench. Hallucination rate ~4% (65% reduction from Grok 4).

Closed Reasoning
Grok Code Fast 1 2025

Specialized for agentic coding — automating dev workflows, debugging, and code generation.

Closed Coding
Grok-1 Mar 2024

314B parameter MoE model. Open-sourced under Apache 2.0 license.

Open Weight 314B MoE
Q

Alibaba Cloud — Qwen

2026

Latest flagship MoE. 8.6×–19× higher decoding throughput than Qwen3-Max. Ultra-long context. Multimodal reasoning.

Open Weight 397B (17B active) MoE Multimodal
2025

Dense: 0.6B, 1.7B, 4B, 8B, 14B, 32B. MoE: 30B-A3B, 235B-A22B. 100+ open weight models total.

Open Weight 0.6B–235B MoE variants

Vision-language models in 3B, 7B, 32B, and 72B parameter sizes.

Open Weight Vision-Language
2024
QwQ-32B-Preview Nov 2024

Reasoning model similar to OpenAI's o1. 32B parameters.

Open Weight 32B Reasoning
Qwen2.5 (3B–72B) / Qwen2 Jun–Sep 2024

Qwen2 released Jun 2024. Qwen2.5 refresh in Sep 2024 with improved quality across all sizes.

Open Weight 3B–72B
C

Cohere — Command

Specialized enterprise variants: Vision for multimodal, Reasoning for complex tasks, Translate for 200+ languages. On-premises deployment available.

Closed Multimodal Reasoning
Command R+ / Command R 2024

RAG-optimized models. Command R+ (104B) for complex tasks, Command R (35B) for efficiency. Multilingual, grounded generation.

Open Weight (R/R+) 35B / 104B
N

NVIDIA — Nemotron

Compact model achieving results that previously required 600B+ parameter models.

Open Weight 49B
Nemotron-4 340B 2024

340B dense model for enterprise applications. Instruct and reward model variants.

Open Weight 340B
Z

Zhipu AI — GLM

GLM-5 2026

Highest Chatbot Arena rating (1451). Top-ranked model by human preference.

Open Weight Reasoning
GLM-4.7 Late 2025

94.2 HumanEval, 95.7 AIME 2025, 85.7 GPQA Diamond, 84.9 LiveCodeBench. 200K context. Arguably most well-rounded open-source model.

Open Weight 200K context Coding Reasoning
GLM-4 / GLM-4V 2024

GLM-4 general model. GLM-4V adds vision capabilities. Multiple size variants.

Open Weight Vision (4V)
+

Other Notable Models

80.2 on SWE-bench Verified (highest). 230B parameters, 205K context. S-tier for software engineering.

Closed 230B 205K context SWE-bench Leader

StepFun's 196B model achieving frontier results at reduced compute. Part of the "small model revolution."

Open Weight 196B
Baidu ERNIE 4.5 / ERNIE X1 2025

Baidu's MoE-based models. ERNIE 4.5 series open-sourced in 2025.

Open Weight (4.5) MoE

Small language models (SLMs). Phi-4: 14B. Phi-3 family: 3.8B mini, 7B small, 14B medium. Punches above weight class.

Open Weight 3.8B–14B

Hybrid SSM-Transformer architecture combining Mamba and attention layers. 52B (12B active). 256K context.

Open Weight 52B (12B active) SSM+MoE 256K context
Databricks DBRX Mar 2024

132B MoE (36B active). 16 experts, top-4 routing. Strong on code and enterprise tasks.

Open Weight 132B (36B active) MoE

Falcon 180B was largest open model at release. Falcon 2 (11B) added vision. From Technology Innovation Institute (UAE).

Open Weight 11B–180B
Learning Hub