One of the world’s foremost AI scientist, Dr Fei-Fei Li believes that data has to be a first-class citizen in the world of AI.
Sharing her views on AI advancement with George Kurian, chief executive of NetApp, Dr Li said that machine learning pushes data into the forefront of algorithms, allowing engineers and scientists to train the data to do something.
“When I started out in AI more than 20 years ago, one of the things we knew was the large data models can make a difference to AI projects. That was the power of models. From there we saw how large language models really work. The power we see today with the software applications available today is just the first step out of the gate,” she said in a fireside chat with Kurian at NetApp Insight in Las Vegas.
Dr Li discovered Imagenet, a large visual database neural network designed for use in visual object recognition. Trained on 15 million images, it revolutionised machine learning when it was released in 2007. With Imagenet, computers could not only just see, but able to perceive.
“A fundamental theory of machine learning is that scientists and researchers really want is generalisation. We don’t really want to memorise a particular flower, but you want to actually be able to generalise and recognise similar flowers or even, you know, plants in general,” she explained.
Data is key to this process. GenAI’s capability is based on the massive amount of data that it is trained on. For organisations, data tends to be siloed, residing in different business units and departments as well as externally with business partners and different stakeholders. To harness GenAI, the data has to be unified and harmonised so that it will generate relevant responses.
Kurian pointed out that the fragmentation of data results in a lack of transparency in data types and locations, rendering it arduous to efficiently manage and leverage for AI workloads. A coherent view is urgently needed to expedite AI capabilities into enterprises’ IT frameworks, he said.
Handling and managing this data has safety, privacy and confidentiality issues, he added. Since Dr Li has been called to testify before the US Congress on responsible AI, Kurian asked her views on AI ethics policy.
Dr Li said that her generation of AI technologists and scientists having discovered new innovations share a responsibility of guiding them to be human centric and to serve public good.
She pointed to the asymmetry between public sector and private sector investments in AI. GenAI is very compute intensive and requires substantial investment in GPUs which large companies in the private sector can afford.
Unfortunately, even if all the universities in the US come together, they would still be unable to afford the costs.
With this level of asymmetry, the public sector has to play a larger role in AI leadership by investing in more GenAI research and innovation and to understand what is under the hood of these applications. This is so that the public can be informed of the capabilities, the potential risks as well as policies to mitigate the harms surfaced by AI.
She urged the private and public sectors across the world to work together and strive for a “moonshot” goal to collectively bring about new innovations for public good.