Is it possible to come out of a double-header of top lectures slightly depressed?
That’s how I felt after watching Turing Prize winner Yoshua Bengio and Nobel Prize winner Edvard Moser, back to back, at the Heidelberg Laureate Forum this morning.
What’s depressing is that we’re just starting to understand how thought happens and how to model it. In both cases we only have a few clues.
Bengio gave his Turing Lecture on efforts in trying to model thought in computers. Moser’s Lindau Lecture was based on work watching how thought happens inside rats. Bengio has learned that rules are necessary to turn data into insights. Without assumptions, or biases, you can’t make sense of things. Moser has learned that brain cells specialize, different cells firing depending on what direction a rat is facing.
Bengio managed to sprain his foot Sunday, but limped to the stage 15 minutes before he began, double checking his set-up, as nervous as a first year Ph.D student preparing to address colleagues for the first time.
He said his work involves teaching neural nets a few “tricks” that let them make sense of vast amounts of data. Rules-based systems don’t work, he learned. They don’t cut through the logarithmic density of the data they’re being given.
The key lesson, he said, is attention. “Attention lets us process any data structure, not vectors,” he said. “Attention consists of loops of optimization, embedded inside each other which let you train a system to generalize on data it hasn’t been specifically trained to recognize.” Systems like Google Translate now use these classification patterns to bring words with similar syntax together. Germany and Norway are both places, so they’re grouped together. We can represent high level concepts with distributional representations.
“To learn what sentences mean you also need a model of the world. The ability to focus on a few elements is what attention is about. Attention mechanisms figure out where to put attention.” But we are nowhere near a human-level AI. Systems must still be supervised by people.
“This dream of discovering high level meanings isn’t achieved. We still have a lot more to do. Part of the answer is looking at how children and infants know what they’re doing. A two year old understands physics by interacting with the environment and learning how it works. They don’t have equations. They conduct experiments and create assumptions based on the results.” It’s the same process that let Bengio re-learn how to get about on stage after spraining his foot. He used assumptions and generalized.
Moser’s work is more experimental. He is studying spatial mapping. To do this he measured the firing of neurons within a rat’s brain as it moved around a box. “Their brains and cortex are organized similarly to the human cortex, just smaller,”
What he found is cells specialize in a position or direction. Cells in the hippocampus and entorhinal cortex operate in the same way. “Each cell has a grid pattern but in a different phase.”
When Moser’s colleagues went from studying dozens of cells to hundreds, they found the same patterns, once the noise in the data was removed. “The cells are organized in a ring and fire depending on the direction the rat is pointing. “The model is repeated in every possible room or condition. This is what you would expect if you expect intrinsic connectivity.” And the same cells fire, in the same way, during sleep.
After his lecture Moser was asked why he was studying spatial understanding. “Space is very simple after all,” he said.
Maybe.
Then why was it that after Moser finished a few hundred of the world’s top data scientists spent 10 minutes navigating with painful slowness through the central space of the hall toward the exit, when there were plenty of fast, direct routes out on the sides. Meanwhile, I had to use canned photos here because I'd left my phone in my hotel room.