OpenAI Goes Open Source: GPT OSS Changes Everything for AI Builders

Discover how OpenAI’s GPT OSS open-source models empower AI builders with free, customizable, high-performance tools for coding, healthcare, and more.

Aug 07, 2025

On August 5, 2025, OpenAI did something nobody expected - they released GPT OSS, their first free AI models in over 5 years. This is huge news for anyone building AI products.

Here's the backstory. When OpenAI started, they promised to make AI free for everyone. They kept that promise with their first models. But then they changed direction - starting with GPT-3.5, everything became paid and locked behind expensive APIs.

Now they're going back to free models. Why the change?

Chinese AI companies like DeepSeek, Moonshot, Qwen and other firms started releasing models that worked almost as well as OpenAI's expensive ones - but completely free. OpenAI realized developers were leaving them for these free alternatives.

Their solution? Keep the most advanced models paid, but release really good free models to win developers back.

No Time to Read? Here's the Scoop

✅ GPT OSS comes in 2 sizes: gpt-oss-20b (21B params) and gpt-oss-120b (117B params). It’s not Multi-modal i.e. it can’t read or understand images, video, audio nor can generate them
✅ Performance reality: 120B matches OpenAI o4-mini, 20B matches o3-mini on key benchmarks. gpt-oss-120b activates 5.1B parameters per token, while gpt-oss-20b activates 3.6B
✅ Hardware requirements: 20B needs just 16GB RAM, 120B runs on single 80GB GPU
✅ Mixture-of-Experts design: Only 3.6B-5.1B active parameters per token for efficiency
✅ Apache 2.0 license: Build and sell products with zero restrictions
✅ Full reasoning capabilities: CoT (Chain of Thoughts), tool use, adjustable reasoning effort levels’

What Are These Models Actually Like?

Both models use something called Mixture-of-Experts. Think of it like having 128 different AI experts, but only using 4 of them for each question. This makes them much faster and cheaper to run.

GPT OSS-20B: The Laptop Model

21 billion parameters with 3.6B active per token - sounds huge, but it's actually the "small" one
Runs on 16GB of RAM - your MacBook Pro or gaming PC can handle it
Perfect for: Prototyping, local development, privacy-sensitive apps

GPT OSS-120B: The Beast

117 billion parameters with 5.1B active per token - comparable to GPT-4 level capabilities
Needs 80GB GPU memory - think professional workstation or cloud
Perfect for: Production apps, complex reasoning tasks, when you need maximum performance

Key Features and Performance Benchmarks of GPT - OSS 120B and 20B

AIME 2024 & 2025 (Competition Math, With Tools):
The open-source GPT OSS models (both 20B and 120B) perform at or above the level of OpenAI’s o3 and o4-mini models, with accuracy rates up to 98.7%. The largest GPT OSS model (120B) matches or slightly outperforms proprietary models on these math tasks.
GPQA Diamond (PhD Science, Without Tools):
All models see a drop in accuracy, but GPT OSS-120B remains competitive with OpenAI’s o3 and o4-mini, scoring just above 80%.
HLE (Expert-Level Questions):
Accuracy drops significantly for all models, with the best scores around 25%. GPT OSS models perform similarly to OpenAI’s smaller models, but all models struggle with these very challenging questions.
MMLU (College-Level Exams):
GPT OSS-120B and OpenAI’s o3/o4-mini models all score above 90%, showing strong performance on broad academic knowledge.

Is GPT - OSS good at coding?

As someone who is very much into coding, I think the performance of the coding benchmarks is very much important. Below is a benchmark performance on SWE Bench Verified and Tau Bench Retail.

SWE-Bench Verified (Software Engineering, Accuracy %):

Evaluates models on real-world software engineering tasks.
“o3” leads with 69.1% accuracy, closely followed by “o4-mini” at 68.1%.
GPT OSS-120B achieves 62.4% and GPT OSS-20B achieves 60.7% , outperforming “o3-mini” (49.3%) and “gpt-oss-20b” (54.8%).

Tau-Bench Retail (Function Calling, Accuracy %):

Tests models’ ability to use tools and call functions correctly.
“o3” is highest at 70.4%, “gpt-oss-120b” is close at 67.8%, and “o4-mini” at 65.6%.
GPT OSS-20B again trails at 54.8%.

OpenAI’s proprietary models still lead in coding and tool use, but the open-source GPT OSS-120B model is closing the gap, especially in software engineering and function-calling tasks. This makes GPT OSS-120B a strong, accessible alternative for developers, even if it doesn’t quite match the very top proprietary models.

GPT OSS: A Strong Contender for Healthcare Applications

What is interesting is the performance of GPT OSS models on the healthcare benchmarks.

If you are building in the HealthCare space, then this is a model that is worth considering. As it’s open source, you can be rest assured about it’s safety and privacy.

Both models also perform strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o).

Healthbench is an open-source benchmark measuring the performance and safety of large language models in healthcare. HealthBench consists of 5,000 multi-turn conversations between a model and an individual user or healthcare professional.

Want to know more about HealthBench - Read this - https://arxiv.org/abs/2505.08775

Benefits of Open Source for Developers and Emterprises – Why GPT OSS Stands Out

1. Freedom to Customize
With GPT OSS, you’re not locked into a one-size-fits-all model. Fine-tune it with your own data—using LoRA, QLoRA, PEFT, or full-parameter methods—to build anything from a hyper-personalized chatbot to a domain-specific assistant. The Apache 2.0 license means you can use, modify, or sell your version with zero legal hassle.

2. Transparent Reasoning
No more black box. GPT OSS models provide chain-of-thought (CoT) outputs, letting you see the step-by-step logic behind every answer. Since OpenAI leaves CoT unsupervised, you can audit, debug, and build your own safety layers—perfect for compliance and peace of mind.

3. Community-Driven Innovation
Open source means rapid progress. Developers worldwide can contribute improvements, share fine-tuned models, and build new tools—making the ecosystem stronger for everyone.

4. Lower Barriers, Greater Control
Run GPT OSS locally, on your own hardware, or in the cloud—no vendor lock-in, no surprise bills. You control your data and deployments, which is essential for privacy-sensitive or regulated industries.

5. Enhanced Privacy
With open-source GPT OSS, your data stays where you want it—on your servers or devices. This gives you full control over sensitive information, making it easier to meet strict privacy requirements and keep user data secure.

Know the Limits: What GPT OSS Doesn’t Do

GPT OSS only works with text—not images, audio, or video.
It’s not a substitute for medical professionals or legal advice.
While strong on reasoning, it doesn’t always match OpenAI’s top proprietary models, especially on tough coding or fact-checking tasks.
Like all language models, it can still make mistakes or hallucinate—so keep a human in the loop for important decisions.

Hallucation in GPT OSS- A matter of Concern

gpt-oss-120b and gpt-oss-20b underperform OpenAI o4-mini on both our SimpleQA and PersonQA evaluations. This is expected, as smaller models have less world knowledge than larger frontier models and tend to hallucinate more. Additionally, browsing or gathering external information tends to reduce instances of hallucination as models are able to look up information they do not have internal knowledge of.

SimpleQA: A diverse dataset of four thousand fact-seeking questions with short answers that measures model accuracy for attempted answers.
PersonQA: A dataset of questions and publicly available facts about people that measures the model’s accuracy on attempted answers.

How to Get Started with GPT OSS

🚀 Test Online (No Setup Required)

Groq: Fast inference at console.groq.com - GPT OSS models available day zero
Together AI: api.together.xyz - Both 20B and 120B models
Fireworks AI: fireworks.ai - Optimized for speed
Openrouter: https://openrouter.ai/

📦 Download Models

Hugging Face Hub:
- GPT OSS-20B
- GPT OSS-120B
Official OpenAI Page: openai.com/index/introducing-gpt-oss

🛠 Development Platforms

Ollama: Easy local deployment - ollama pull openai/gpt-oss-20b
LM Studio: User-friendly GUI for local models
vLLM: Production-grade inference server

In short, the AI landscape just got a lot more interesting. With GPT OSS now in the wild, the doors are wide open for builders, tinkerers, and dreamers everywhere. Personally, I can’t wait to see the wild, weird, and wonderful things people will create next. Buckle up—exciting times are ahead!

Check out my previous articles