As ChatGPT launched the world into the AI and LLMs race, enormous amounts of efforts and capital went into improving the technology, not only in quality, but also in speed and price. As I wrote earlier, LLMs saw an 86.5% reduction in cost just in 2023 (Mar/2023 to Nov/2023) if we take GPT 3.5-turbo as a baseline (if we would go a step before turbo, plain GPT-3.5 on Nov/2022, this actually jumps to 92.5% reduction for the first year!). For contrast, IT equipment historically reduced “just” 23% per year on its best years, even tech that faced impressive sharp cost declines like storage, solar, gene sequencing or LED bulbs were not that fast.
And the price cuts continues steadily as new challengers try to take both OpenAI and NVIDIA’s crowns. Sure, a lot of price cuts might come from VCs and Big Tech deep pockets, but given that open-source LLMs are definitively here to stay, predatory pricing just to get monopoly might not be the best investment, and they need to be sure to recover the costs somehow, so it certainly feels like the price cuts are in a big part due to real optimizations, not just hype dynamics, and papers comming out give a good evidence support for that.
Timeline with GPT-3.5 as a baseline
Nov/2022 - OpenAI launches GPT-3.5 (text-davinci-003) for $20 / 1M tokens
Mar/2023 - OpenAI launches GPT-3.5-turbo, for $2 / 1M tokens, a 10x reduction
Jun/2023 - OpenAI reduces GPT-3.5-turbo input cost by 25%, at $1.5 / 1M tokens, keeping output at $2 / 1M tokens
Nov/2023 - OpenAI announces on it’s DevDay a reduction in GPT-3.5 turbo input cost again to $1 / 1M tokens (output still at $2 / 1M tokens), while increasing the context window to 16K
Dec/2023 - Mistral announces Mixtral 8x7B, achieving GPT 3-5’s performance, and costing $0.7 / 1M tokens. Given it’s open-source nature, providers rush to host it at lower prices, with DeepInfra offering it at $0.27 / 1M tokens
Jan/2024 - OpenAI announces GPT-3.5 price reduction again, with $0.5 / 1M tokens input and $1.5 / 1M tokens output (lower than Mistral for input, but not the lowest anymore at this quality)
Feb/2024 - Groq opens it’s API access with it’s new LPU engine, offering a lower price guarantee per million of tokens, with Mixtral 8x7B at the same $0.27 / 1M tokens, but at ~480 tokens/s speed
Mar/2024 - Anthropic announces the new Claude 3 family, with it’s smaller Haiku model at GPT 3-5’s capability but for half of the price for input, at $0.25 / 1M tokens for input and $1.25 / 1M tokens for output
In summary, this represents ~92.5% reduction from Nov/2022 to Nov/2023, and ~86.5% from Mar/2023 to Mar/2024, I’ll keep this timeline up to date, let’s see how it will look like this november, if the cost reduction trend will reduce, or keep accelerating!
Now, you might be wondering, do we even care about GPT-3.5 when GPT-4 is out there? Well, yes, it is a quite capable model and good baseline to start with and look at price reductions, specially now that other models are catching up in performance.
However, let’s also take a look at a timeline for GPT-4 as a baseline.
Timeline with GPT-4 as a baseline
Mar/2023 - OpenAI launched GPT-4 behind a waitlist for developers, costing $30 / 1M tokens for input and $60 / 1M tokens for output
July/2023 - GPT-4 API generally available for all paying customers
Nov/2023 - OpenAI announces GPT-4-turbo at the DevDay, with 128K context window, and reducing the prices to $10 / 1M tokens for input and $30 / 1M tokens for output
Mar/2024 - Anthropic seems to be the first one to really beat GPT-4 with Claude 3 Opus, but at a slighlty saltier price of $15 / 1M tokens for input and $75 / 1M tokens for output. However, the runner up Claude 3 Sonnet does beat GPT-4 in some benchmarks and get quite close on others, at a lower $3 / 1M tokens for input and $15 / 1M tokens output prices
That’s a ~33% to ~90% input price reduction from Mar/2023 to Mar/2023, depending how you look at it.
Now, you might notice that Google’s Gemini is missing from those lists, this is because Google is being weird with their prices, telling that the prices for Gemini Ultra are “comming soon”, offering Gemini Pro completely for free (desperate much?) and losing public confidence, like comparing 5-show with CoT, trying to pull marketing stunts with very heavily edited demo videos, and so on. When they can actually show comparable performances and prices without gimmicks, then we can put them as a good comparison point as well.
That’s it, GPT-4 seems like a tougher game to play with Anthropic reaching it just now, but regardless of that, the LLM game doesn’t seem to be slowing down anytime soon, the remaining question to answer is if this remaining GPUs after optimization will transformed in business value and cost recovery, or if it means we can go stronger, bigger, GPT-5 and ahead.
Cheers!