
IMAGE: Deeptech Times
At a fascinating tech talk at ATxSG, Prof. Yejin Choi, Dieter Schwarz Foundation Professor of Computer Science and Senior Fellow, Stanford Institute for Human-Centred AI, raised the idea of how we can deploy smaller language models to outperform larger commercialised ones. Instead of relying on brute-force computing power, the key lies in using high-quality data, innovative algorithms, and fostering global cross-border collaboration.
Democratising GenAI – Mission impossible?
Choi outlined the challenge: with so much of today’s large language models (LLMs) dominating adoption, and monopolising huge compute and processing power, how do users without the means close the gap with frontier intelligence capabilities?
“Scaling laws demand extreme-scale compute that few can afford,” said Choi. For example, Sam Altman, OpenAI’s CEO, when asked about how Indian startups can create foundation models for India, responded: “It’s hopeless to compete with OpenAI.”
How do we make advanced AI technologies accessible to all, not just big companies or countries with huge resources? Choi posited there may be a solution to the challenge: using smaller models, with clever strategies, to rival or surpass the performance of large-scale models.
Improving data quality and innovating algorithms
A major highlight was the innovative methods used by her team at Stanford to improve data quality. They worked with an older model, GPT-2, and focused on three components to innovate: distillation, info-theoretic distillation, and prismatic synthesis.
AI usually works well in summarising or paraphrasing sentences when one or all these conditions are met: extreme-scale pre-training, reinforced learning (RL) human feedback at scale, supervised datasets at scale.
Choi’s hypothesis was that AI is only as good as the data it was trained on, and that would be the team’s advantage–training models on data that doesn’t yet exist on the Internet (as the frontier models would have already downloaded them), and data that is qualitatively better than current ones.
GPT-2 was not built to understand prompts in the way newer models do, so when her team tried writing prompts, it returned under 0.1 per cent accuracy for word parings.
Through rewriting some algorithms, they managed to tune this model to begin correctly pairing words slightly over 10 per cent of the time—a step-up from zero! Encouraged by the result, her team got GPT-2 to over-generate data, filtrated it for better results with a three-layered funnel, parsed it through a ‘critical’ filter (thumbs-up/down for reinforced learning), and ran multiple iterations so that now the student model becomes the teacher models (self-distillation) for the next ones.
The result? It managed to close the gap between a GPT-3 model that was 200 times larger for fluency, concision, and faithfulness to the prompts.
The team went on with other approaches (info-theoretic distillation and prismatic synthesis) that matched, or even outperformed benchmarks set by larger models, including ChatGPT.
Global collaboration
Choi stressed the importance of open community and cross-border collaboration. By sharing knowledge and resources worldwide, the potential for advancements in AI is greatly enhanced. This collaborative effort democratises access to cutting-edge technologies and ideas, ensuring that more people can benefit from AI innovations.
Brute-force scaling is effective, yet increasingly unsustainable. The alternate way of thinking is to seek smart scaling instead, creating compact yet powerful language models.
As Choi wryly remarked, to much amusement, “You guys are not generating social media data fast enough to feed the large language models at scale.”
While her team’s work is largely academic at this stage, it is promising: the change that it could deliver means more efficient and accessible AI technologies. Afterall, as she points out, AI usage should benefit all humans, not just some, and more importantly, it should serve humans, not the other way around.