The bitter lesson for generative AI adoption

The rapid development and improvement in generative AI technology pose a challenge for training and fine-tuning as a sustainable path for adoption. If organizations constantly need to fine-tune new models for specific tasks, they might be in a costly cycle of catching up with new technology. In contrast, prompt engineering and retrieval-augmented generation (RAG) focus on improving the retrieval and integration of information, a process that can continuously benefit from advances in generative technology. This is a more sustainable short-term adoption strategy.

[ This article is an excerpt from Generative Artificial Intelligence Revealed, by Rich Heimann and Clayton Pummill. Download your free ebook copy at the book’s website. ]

In a popular blog post titled “The Bitter Lesson,” Richard Sutton argues that general methods leveraging computation outperform specialized methods in AI research, fundamentally due to the decreasing computation cost over time.

This argument purges research in favor of Moore’s Law and effectively asks the next generation of researchers to do less. However, we believe there is a “bitter lesson” style analysis for generative AI adoption. Specifically, we prefer retrieval-augmented generation and prompt engineering to training and fine-tuning language models—at least as an initial adoption strategy.

The trouble with training and tuning

A bitter lesson would suggest that relying on training or fine-tuning is less efficient and riskier than waiting for newer, perhaps more robust models. Fine-tuning demands substantial resources. Each new domain or significant shift in data distribution may require retraining or updating the model. This process is expensive and doesn’t necessarily generalize across different tasks or datasets without further fine-tuning, making it inefficient when new models or technologies emerge. RAG and prompt engineering allow organizations to adopt generative technology without training anything in the technology stack, which will accelerate adoption, lower costs, and help ease lock-in.

New models will likely incorporate higher-quality training data, better generalization capabilities, and more advanced features such as infinite context windows that reduce the need for fine-tuning. Consequently, software engineers should write abstractions on top of existing models, which can be done much faster and cheaper than training and fine-tuning language models. These abstractions can migrate to newer models, whereas training and tuning cannot. Investing in RAG and prompt engineering allows organizations to be flexible and to adopt technology without the continuous need for retraining, thus aligning with the bitter lesson principle emphasizing the importance of computation and general methods, such as retrieval mechanisms, over specialized solutions.

The rapid pace of innovation and the proliferation of new models have raised concerns about technology lock-in. Lock-in occurs when businesses become overly reliant on a specific model with bespoke scaffolding that limits their ability to adapt to innovations. Upon its release, GPT-4 was the same cost as GPT-3 despite being a superior model with much higher performance. Since the GPT-4 release in March 2023, OpenAI prices have fallen another six times for input data and four times for output data with GPT-4o, released May 13, 2024. Of course, an analysis of this sort assumes that generation is sold at cost or a fixed profit, which is probably not true, and significant capital injections and negative margins for capturing market share have likely subsidized some of this. However, we doubt these levers explain all the improvement gains and price reductions. Even Gemini 1.5 Flash, released May 24, 2024, offers performance near GPT-4, costing about 85 times less for input data and 57 times less for output data than the original GPT-4. Although eliminating technology lock-in may not be possible, businesses can reduce their grip on technology adoption by using commercial models in the short run.

Avoiding lock-in risks

In some respects, the bitter lesson is part of this more considerable discussion about lock-in risks. We expect scaling to continue, at least for another couple of interactions. Unless you have a particular use case with obvious commercial potential, or operate within a high-risk and highly regulated industry, adopting the technology before the full scaling potential is determined and exhausted may be hasty.

Ultimately, training a language model or adopting an open-source model is like swapping a leash for a ball and chain. Either way, you’re not walking away without leaving some skin in the game. You may need to train or tune a model in a narrow domain with specialized language and tail knowledge. However, training language models involves substantial time, computational resources, and financial investment. This increases the risk for any strategy. Training a language model can cost hundreds of thousands to millions of dollars, depending on the model’s size and the amount of training data. The economic burden is exacerbated by the nonlinear scaling laws of model training, in which gains in performance may require exponentially greater compute resources—highlighting the uncertainty and risk involved in such endeavors. Bloomberg’s strategy of including a margin of error of 30 percent of their computing budget underscores the unpredictable nature of training.

Yet, even when successful, training may leave you stuck with your investment. Training may prevent you from using new models with better performance and novel features or even new scaling laws and strategies for training models. Don’t handcuff yourself to the wheel of your ship. While steering clear of proprietary pitfalls, you’re still shackled by the immense sunk costs of training, not to mention the ongoing obligations of maintenance and updates. Before training a model, you should ensure a clear and compelling need that cannot be met by existing pre-trained models or complex prompting strategies such as Everything of Thoughts (XoT) and Medprompt or more abstraction through less complicated modifications such as RAG.

The Anna Karenina principle

AI adoption can be likened to the famous quote by Russian author Leo Tolstoy in Anna Karenina: “All happy families are alike; each unhappy family is unhappy in its own way.” When applied to AI adoption, we might say: “All successful AI adoptions are alike; each failed adoption fails in its own way.” The “Anna Karenina principle” was popularized by Jared Diamond in his 1997 book Guns, Germs, and Steel. Diamond uses this principle to explain why so few wild animals have been successfully domesticated throughout history. Diamond argues that a deficiency in many factors can make a species undomestic. Thus, all successfully domesticated species are not due to possessing a particular positive trait, but because they lack any potential negative characteristics.

AI adoption is complex and requires more than downloading an open-source model from Hugging Face. Successful adoptions start with clear objectives and knowing precisely what the business needs to achieve. Don’t pursue AI because it’s trendy, but because you have specific goals. Successful adoption requires strong leaders who have a clear vision of how the technology will impact the business and who are committed to the strategy. They must manage risk and anticipate future needs with robust and scalable adoption strategies, allowing seamless integration and growth. They must also handle change management and ensure employees are onboard and understand the changes. Ethical considerations must also be addressed to ensure that AI is used responsibly. Everyone plays a vital role in adopting AI.

One guiding principle that may help leaders is Liebig’s Law, or the law of the minimum, a principle developed in agricultural science and later popularized by Justus von Liebig. It states that growth is dictated not by total resources available but by the scarcest resource or the limiting factor. You are already familiar with this law. It has been codified in clichés like “a chain is only as strong as its weakest link.” Liebig’s Law implies that the success of AI deployment is constrained by the most limiting factor in the adoption process. These factors include data, human capital, computational resources, governance, and compliance. Yet, even then, you may adopt the technology in a way that limits its potential or creates dependencies that are hard to escape. Businesses must balance innovation and practicality, avoiding vendor lock-in and focusing on modular, flexible technologies that allow them to remain agile and responsive to new developments. This approach ensures they can adapt quickly and cost-effectively to the ever-evolving AI landscape.

Rich Heimann is a leader in machine learning and AI whose former titles include Chief AI Officer, Chief Data Scientist and Technical Fellow, and adjunct professor. He is the author of Doing AI: A Business-Centric Examination of AI Culture, Goals, and Values and co-author of Social Media Mining using R.

Clayton Pummill is a licensed attorney specializing in complex machine learning, data privacy, and cybersecurity initiatives while building enterprise solutions and support practices for organizations facing machine learning regulations. Active in the technology startup space, he has developed patented technology, co-founded organizations, and brought them through to successful exits.

—

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact [email protected].

Sources: Info World
Published: Jan 21, 2025, 4:00:00 AM EST