The Inevitable Commoditization of LLMs

Sep 25, 2024

In just two years, we've gone from the world discovering what a Large Language Model (LLM) is, thanks to OpenAI's release of GPT-3.5 via ChatGPT, to now discussing the possibility of LLMs becoming a commodity. Despite the billions of dollars required to push LLMs forward, what is driving these conversations?

Note: This post focuses on general-purpose LLMs and not specialized ones.

What Makes Technology a Commodity?

Curious about what defines the commoditization of technology, I decided to ask Perplexity for his thoughts. He suggested that there are four key criteria for a technology to be considered a commodity:

Widespread Availability
Price Competition
Standardization
Low Switching Costs

Let’s dive into the LLM market to see why we’re already talking about the commoditization of this relatively new tech.

The LLM Race Leading to Commoditization

Criteria: Widespread Availability

While some LLMs were already in development—and even available—before the release of GPT-3.5, it wasn’t until November 2022 that the world truly began paying attention to their rapid evolution.

Although most experts agree that we’re still in the early stages of this technological revolution, what was supposed to be a marathon is starting to look more like a 100-meter sprint!

Anthropic (Claude), Mistral (Mixtral), Google (Gemini), Meta (Llama), and countless others have jumped into the race, launching their own general-purpose LLMs in fierce competition with OpenAI’s GPT models.

Every new model and version seems to outshine the previous “leader.” In fact, Peter Gostev created a video illustrating the evolving rankings of LLMs based on LMSYS Chatbot Arena’s data.

And as if that weren’t enough, we’re also witnessing open-source LLMs closing the performance gap with their closed-source counterparts.

Commoditization evaluation on widespread availability: ✅ Checked.

Criteria: Price Competition

As the race for better performance heats up, another trend is quietly emerging: the cost per token–the primary way companies price their model usage–is dropping.

With model performance beginning to converge, pricing pressure will continue and become a strong deciding factor for organizations when selecting a model to use.

Commoditization evaluation on price competition: ✅ Checked.

Criteria: Standardization

This one’s a bit more nuanced, but when we look at general-purpose LLMs from a high-level perspective, the major providers are all offering APIs that make connecting and using their models a breeze.

On top of that, every LLM provider is carefully watching what the competition is doing and rolling out similar capabilities and features in response.

Commoditization evaluation on standardization: ✅ Checked.

Criteria: Low Switching Costs

Did I just mention that you can access LLMs via a simple API call? Well, that means switching from one model to another should be as easy as clicking "next" on your favorite playlist.

And it gets even easier when you consider that developers are often using model orchestrators like LangChain or LlamaIndex, which make swapping models a breeze.

Commoditization evaluation on low switching costs: ✅ Checked.

“The capex surge driven by the development of larger and more powerful models is to be put into perspective with the risk companies face by not participating in the next platform shift. The financial stakes are rising significantly across the industry.”

Yes, Commoditization is Happening

With a 4 out of 4 on the commoditization checklist, it’s no surprise that the conversation around LLMs becoming a commodity is already in full swing.

But this raises new questions, especially when you consider two additional trends:

Training Costs Keep Skyrocketing: Yes, we’ve seen some recent, smaller models outperform previously released LLMs. But overall, the cost of training these models is soaring. Take these rough estimates, for example (since OpenAI hasn’t shared the exact numbers):

GPT-3 training: over $4.6M (source)
GPT-4 training: over $100M (source)
GPT-5 training: over $2B (source)

Sky-High Company Valuations: Meanwhile, high training costs don’t seem to be slowing down company valuations. OpenAI is rumored to raise for a valuation at $150B, and Cohere recently raised $500M at a $5.5B valuation. The Information estimates this at 140 times forward revenue—even higher than the valuations seen with OpenAI and Anthropic.

Recently, Raphaëlle D’Ornano, founder of D’Ornano + Co, explained why GenAI is creating discontinuity rather than disruption. In our conversation, she noted, “The capex surge driven by the development of larger and more powerful models is to be put into perspective with the risk companies face by not participating in the next platform shift. The financial stakes are rising significantly across the industry.”

So, is this the whole story? Or are we missing something that could rationalize some of this?

The Hidden Challenges for Enterprises

There’s no doubt that leadership teams are feeling the heat. Companies who are developing LLMs as a side activity—like Meta and Google—are certainly in a better position than those whose core business revolves around LLMs—like OpenAI, Anthropic, Cohere, and Mistral.

However, our analysis indicating switching between LLMs is simple may be short-sighted. There are both technical and business-related factors that could make it more complicated than you’d assume:

Prompting Challenges: Developers are investing significant time and effort into crafting effective prompts to integrate LLMs into their tech stack and operations. While we can use natural language to interact with these models, many variables affect the quality of an LLM’s output—including the input prompt. There is currently no guarantee that a prompt optimized for one model will deliver the same results on a different model.

Evaluation Complexities: This brings us to another critical challenge, the evaluation. Any new software or update must undergo rigorous testing by IT teams before going on production. With GenAI being so new, and probabilistic systems being tricky to assess, it’s incredibly difficult to determine how well an LLM will perform for a specific business use case. Chatting with Guillaume Nozière, Forward Deployed Engineer at Patronus AI, he shared that “although manual evaluation is the most accurate way to check the correctness of model outputs, it tends to be slow, expensive, and does not scale. For LLM applications to be successfully deployed at scale, it is crucial to implement a rigorous evaluation suite that combines benchmarking datasets and automated evaluators”.

Standard benchmarks may not do much to inspire confidence in the enterprise.

“Every enterprise selecting an LLM for deployment is questioning how its performance on standard benchmarks will translate to real-world enterprise environments. At the end of the day, a standard test is generic and therefore cannot tell you specifically whether a model is best suited for your use case.

In addition, these standard benchmarks are often public and therefore can be gamed or even used as part of the training data by LLM providers. It is therefore very important for customers to use proprietary datasets tied specifically to their use case when selecting which LLMs to work with”.

Vendor Commitments: LLM vendors are eager to lock in enterprises through partnership agreements—rather than trusting a single developer’s choice—ensuring their long-term presence within organizations. For example, companies already heavily embedded in Microsoft’s ecosystem are more likely to adopt OpenAI’s solutions due to the strategic partnership in place.

Legal Considerations: The legal hurdles surrounding GenAI and LLM adoption cannot be ignored. As usual, Europe is first in taking action, with the release of the EUAI Act. Organizations simply don’t have the bandwidth to review the growing number of LLM vendors on the market. Legal teams may push leadership to choose one vendor, ensure due diligence, and require to stick with that choice—at least for now.

The LLM Race Will Continue to Showcase Expertise

There’s no doubt that the perception of commoditization is ramping up the pressure on LLM vendors. But so far, we haven’t seen any major players back down from the LLM race. Instead, it looks like everyone is doubling down, raising enormous funds to train their models, all while figuring out the best path to monetize them.

For enterprises adopting these LLMs, developers may face an uphill battle convincing leadership to switch to the “newest leading model” every few months.

But let’s not forget—the LLM itself was never meant to be your business differentiator. What truly sets you apart will always be your data, your teams, and your processes.

🔑 In the long run, I predict we’ll see broader adoption of smaller, more specialized models. General-purpose LLMs may end up like Formula 1 cars: impressive feats of technological progress that companies use to showcase their research and expertise—with different leaders at different times—but not something you will buy to use for your business as usual.

Thanks to Raphaëlle D’Ornano and Guillaume Nozière for reading drafts of this.

Foreign Key

Discussion about this post