Generative AI is an evolution from the database computing we’ve been doing for decades. It’s not a revolution.
There are two differences between AI and normal database computing:
- The database that can be accessed is larger and some of it may be public.
- The output can come in different forms, including pictures, stories, and code.
Garbage In, Garbage Out
The GIGO rule still applies. Garbage In, Garbage Out. So do all the other rules of society, including copyright. That means you can’t train an AI on data you don’t legally control or have legal access to.
There are two sides to the AI puzzle. One is the Large Language Model (LLM), the algorithm you’re using to get answers. The other side is Training Data, the database you’re getting answers from. The questions you can ask, and the answers you can get, will be limited by both the LLM algorithm and its training database.
Microsoft let OpenAI get a march on all this in 2023 by giving it access to all the data under its command to train OpenAI’s model. Most LLMs will have to be built carefully. So will OpenAI’s if it wants commercial contracts. Adobe has built a training database of data it can give customers access to, so using its SongBird creates no legal problems. The database creates a “moat” around SongBird’s market that’s more effective than the algorithm.
This means there are many tasks that must be done for any Generative AI model to work. It’s also why current “AI” projects are more typical of what you’d see in database computing. Palantir can manage a hospital’s resources because it has access to all the hospital’s data, including its demand patterns. This kind of thing can be done for customer management by Salesforce, for operations management by ServiceNow, and counts as low-hanging fruit. Each successful installation may be unique, but each one also helps the LLM, training it and making it better.
Stumping Your AI
Success, measured in productivity, for today’s basic AI LLMs, is driving people to build larger LLMs with larger datasets. This is where Generative AI is going to get into trouble. Do you have the right to use the data you’re training your algorithm on? Can your model deliver the productivity gains of smaller models? This is where current controversy lies, and it’s where I expect the “slough of disillusion” that hits all tech breakthroughs to develop.
It takes a lot of math to prove any LLM on any database. It also takes basic enabling software. That’s why Nvidia is doing so well right now. That’s why the Cloud Czars will continue to do well, by selling the math capability and going “up the stack” with their own LLMs. Application vendors will just stick “AI” on their current wares and grow that way.
But in 2024, let’s be clear about one big thing. No one has created Artificial General Intelligence (AGI) yet. Any model can be stumped when it’s asked a question outside the data its algorithm was trained on. AI is software, and software is not human.
Ask your grandpa this question over Christmas. Why does the porridge bird lay its eggs in the air?