OpenAI — GPT & o-Series
Latest flagship. Supersedes GPT-5.2. Enhanced reasoning, coding, and multi-step planning.
New default in ChatGPT, replacing GPT-4o, o3, o4-mini, GPT-4.1, and GPT-4.5. Pro variant uses scaled parallel test-time compute.
First GPT-5 series release. 400K context, 100% on AIME 2025, 6.2% hallucination rate (~40% reduction).
OpenAI's first open-weight models. 120B and 20B parameter variants.
Major advances in coding, instruction following, and long-context understanding. Supports up to 1M tokens. Retired from ChatGPT Feb 2026.
Small reasoning model optimized for science, math, and coding. Part of the o-series reasoning family.
Next-gen reasoning models. o3-mini optimized for STEM tasks. o3 full model for complex multi-step reasoning.
General-purpose LLM with improved EQ and reduced hallucinations. Released as research preview.
First reasoning models trained with RL for complex multi-step thinking. Preview release.
"Omni" model with native multimodal capabilities (text, vision, audio). 128K context. GPT-4o mini is a smaller, faster variant.
Flagship model with vision capabilities. GPT-4 Turbo expanded context to 128K and reduced cost.
Cost-effective model that powered the original ChatGPT. 16K context variant released mid-2023.
Anthropic — Claude
Most capable Claude model. 1M context window, Agent Teams for native multi-agent collaboration. API: claude-opus-4-6
Best balance of speed and quality. Same pricing as Sonnet 4.5. API: claude-sonnet-4-6
Major upgrade to Opus tier with improved agentic capabilities and long-form reasoning.
Fastest and most cost-effective Claude model. API: claude-haiku-4-5-20251001
Claude 4 generation. Opus 4 excels at long-running tasks and agentic workflows. Sonnet 4 improved coding, reasoning, and instruction-following.
Introduced extended thinking — hybrid reasoning that lets Claude pause and think step-by-step. Major quality jump on complex math, science, and code.
Upgraded Sonnet with computer use capability. Haiku 3.5 introduced as fast, affordable tier.
Outperformed the larger Claude 3 Opus on benchmarks. Set a new standard for the Sonnet tier.
Introduced the three-tier model structure. Opus (most capable), Sonnet (balanced), Haiku (fastest).
Google DeepMind — Gemini
Most advanced Pro-tier model. 1M context, 77.1% on ARC-AGI-2. Deep Think reasoning. Multimodal (text, image, audio, video, code).
Lightweight, fast model for high-throughput tasks. Preview release.
Gemini 3 generation. Pro replaces Ultra tier. Powerful agentic and coding capabilities.
Improved reasoning and coding. Flash variants optimized for speed and cost.
2.0 Flash became default model. Pro variant released Feb 2025.
MoE architecture. First to offer 1M token context. Flash variant for speed-sensitive tasks.
Original Gemini family. Ultra (complex tasks), Pro (general), Nano (on-device).
Meta AI — Llama
First MoE architecture in Llama family. Scout: 109B total / 17B active, 10M context. Maverick: 400B total / 17B active, optimized for quality. Multimodal (text + image + video).
First multimodal Llama. Text-only: 1B, 3B. Vision-enabled: 11B, 90B.
8B, 70B, and 405B parameters. 128K context. Multilingual. Strong tool use and reasoning.
8B and 70B parameters. Significant quality improvements over Llama 2.
7B, 13B, 70B. Commercial use license. Partnership with Microsoft.
7B to 65B parameters. Research-only license. Sparked the open-source LLM movement.
Mistral AI — Mistral & Mixtral
Sparse MoE. 675B total / 41B active parameters. Frontier-level performance.
Three small dense models for edge and cost-sensitive deployments.
Specialized code models for software development workflows.
First Mistral reasoning models with chain-of-thought capabilities. Small is open-source.
Mid-tier model balancing quality and cost.
Efficient small model for edge and embedded use cases.
Dense 123B parameter model. 128K context, 80+ languages. Pixtral Large adds multimodal (124B).
Sparse MoE: 8 expert networks of 7B each (~47B total, ~13B active). Outperformed Llama 2 70B.
First Mistral release. 7B dense model that punched far above its weight class.
DeepSeek — V-Series & R-Series
V3.2-Exp uses Sparse Attention. Speciale variant surpasses GPT-5 on AIME/HMMT benchmarks.
Hybrid model combining V3 + R1 strengths. 671B (37B active). 128K context. Switchable thinking/non-thinking modes.
Major R1 upgrade pushing reasoning and inference capabilities further. Built on V3 Base.
Improved post-training drawing lessons from R1. Better reasoning, coding, and tool use.
The "DeepSeek moment." ChatGPT-level reasoning at fraction of training cost. MIT License. Includes distilled variants (1.5B–70B).
671B total / 37B active MoE. Trained on 14.8T tokens. Competitive with GPT-4o at fraction of cost.
V2 introduced Multi-head Latent Attention (MLA). V2.5 merged chat and coder capabilities. Coder V2 specialized for code.
xAI — Grok
#1 on LMArena Elo (1483) and EQ-Bench. Hallucination rate ~4% (65% reduction from Grok 4).
Specialized for agentic coding — automating dev workflows, debugging, and code generation.
314B parameter MoE model. Open-sourced under Apache 2.0 license.
Alibaba Cloud — Qwen
Latest flagship MoE. 8.6×–19× higher decoding throughput than Qwen3-Max. Ultra-long context. Multimodal reasoning.
Dense: 0.6B, 1.7B, 4B, 8B, 14B, 32B. MoE: 30B-A3B, 235B-A22B. 100+ open weight models total.
Vision-language models in 3B, 7B, 32B, and 72B parameter sizes.
Reasoning model similar to OpenAI's o1. 32B parameters.
Qwen2 released Jun 2024. Qwen2.5 refresh in Sep 2024 with improved quality across all sizes.
Cohere — Command
Specialized enterprise variants: Vision for multimodal, Reasoning for complex tasks, Translate for 200+ languages. On-premises deployment available.
RAG-optimized models. Command R+ (104B) for complex tasks, Command R (35B) for efficiency. Multilingual, grounded generation.
NVIDIA — Nemotron
Compact model achieving results that previously required 600B+ parameter models.
340B dense model for enterprise applications. Instruct and reward model variants.
Zhipu AI — GLM
94.2 HumanEval, 95.7 AIME 2025, 85.7 GPQA Diamond, 84.9 LiveCodeBench. 200K context. Arguably most well-rounded open-source model.
GLM-4 general model. GLM-4V adds vision capabilities. Multiple size variants.
Other Notable Models
80.2 on SWE-bench Verified (highest). 230B parameters, 205K context. S-tier for software engineering.
StepFun's 196B model achieving frontier results at reduced compute. Part of the "small model revolution."
Baidu's MoE-based models. ERNIE 4.5 series open-sourced in 2025.
Small language models (SLMs). Phi-4: 14B. Phi-3 family: 3.8B mini, 7B small, 14B medium. Punches above weight class.
Hybrid SSM-Transformer architecture combining Mamba and attention layers. 52B (12B active). 256K context.
132B MoE (36B active). 16 experts, top-4 routing. Strong on code and enterprise tasks.
Falcon 180B was largest open model at release. Falcon 2 (11B) added vision. From Technology Innovation Institute (UAE).