Bindu Reddy@bindureddy
The evolution of LLMs over the next couple of years - Will the tech become a commodity and commonplace?
Not a single day passes by without someone announcing a new LLM and foundation models get replaced by next-generation models in a matter of months.
So what is the future of this technology and how is the space likely to evolve?
Data Advantage and Information Queries - To begin with, LLMs will soon become data-constrained. Even with a large number of GPUs, most companies don't have access to new and unique sources of data.
Google and Meta have a huge advantage here compared to anyone else. Google, because of it's search dominance can get away with crawling every website that wants their search traffic and can use YouTube and potentially Gmail data to train their LLMs. Meta enjoys the same advantage in terms of being able to use Facebook and Instagram data.
This will translate to Google and Meta LLMs being able to serve and respond to general information queries better than LLMs from OpenAI or other start-ups.
Already, Bard outperforms GPT-4 when it comes to queries about recent data and GPT-4 does much better on queries on data available before September 2021 (it's training cut-off date).
LLMs For General Purpose Tasks - So the next question is will we have specialized LLMs for some general purpose tasks like coding, reasoning, summarization, or writing.
For example, GPT-4 does really well on code compared to Google's LLMs, so will there be several purpose-built LLMs?
This is unlikely to be the case for general-purpose tasks. Large SOTA LLMs outperform specialized LLMs in most tasks. Again GPT-4 outperforms specialized LLMs on pretty much everything from code generation to writing and reasoning tasks
Here is it important to draw the distinction between general-purpose tasks like Python code generation vs. a very specialized task like having knowledge of Abacus APIs and programming the Abacus platform. The former typically DOES NOT require fine-tuning or RAG (retrieval augmented generation) while the latter requires some custom work
All this means that we will end up with Google, Meta, and potentially OpenAI being the key players in the consumer LLM (e.g. ChatGPT, Bard, etc.) world.
It is extremely unlikely that we will have more than 2-3 of these services. These services, like ChatGPT, will have a free and paid subscription tier. Paying subscribers will enjoy premium features like personalized responses and access to multi-modal features etc.
Enterprise AI and LLM APIs - The other big category of LLM use cases is businesses using these LLMs in their core products, services, and business processes.
There are 2 classes of use cases in this space.
General purposes use-case embedded in a product or service - e.g. summarize my Slack channel or Zoom meeting. For these use cases, a vanilla API call to SOTA LLM is sufficient. Price will be the key consideration in these use cases and as long as the large LLM providers have very competitive prices, just making simple calls to their APIs will work.
Specialized large-scale use-cases on custom knowledgebases This is the category of custom enterprise use-cases, where you may have several thousands of calls per day and the LLM needs to have an understanding of a custom knowledgebase or task.
I suspect, that smaller more efficient LLMs that have reasonably good reasoning capabilities can be fine-turned or complemented with RAG and incorporated into the workflow to automate these use cases. Using GPT-4 or some other very large LLM will become cost-prohibitive in these cases.
Companies will use LLM Ops platforms such as Abacus to automate an end-to-end workflow and these platforms will offer a combination of both open-source LLMs and closed-sourced APIs. Companies should be free to pick and choose an LLM based on cost, performance, and time to market.
In some very specific cases, we will also see some very specialized LLMs emerge - e.g. financeLLM or LegalLLM. These domain-specific LLMs that may require a lot of custom training, RLHF, and fine-tuning. For example, Bloomberg created a BloombergGPT a 50-billion parameter large language model that was purpose-built from scratch for finance.
Having said that, BloombergGPT is probably out of date already, as such custom models lack good reasoning skills that general-purpose LLMs possess and it is much better to simply fine-tune or use RAG on a SOTA LLM compared to training a custom model for your special task.
Net-net, we are seeing what you expect to see in a new and exciting space - a large number of companies are being started in this space and over time, we will see a lot of consolidation and a handful of key players emerge.
Just like with other core infrastructure such as operating systems or databases, there is likely to be a healthy open-source ecosystem that complements the services from the giants