In the old days, people would spend a long time teaching computers rules. Expert systems attempted to simulate the expertise of professionals by following the same rules. This has several drawbacks:
- experts don’t always understand the rules they use;
- many rules experts use are wrong; and
- takes a long time to teach a computer rules.
Rather than remembering facts or fixed responses to questions, an AI needs a model of its world. It’s not enough to know some good chess moves – an AI needs to know what moves are possible and to be able to assess how good they are. Once it has a model, it can make its own predictions, explore possible outcomes, and build new refinements automatically.
This model is informed – and preferably inferred – from training data. When AI was getting started, the internet was packed with text – tainted by human bias. The easiest way to get raw training data was to use live video – a rare type of readily available fresh information to analyse and understand – so this was the logical place for me to start.
My AI model comprises forms, and the relationships between them.
Being made of pure information, these forms are amenable to processing by computer. The relationships between these forms can be studied in abstraction – and these relationships correspond to relationships in the real world.
The first step is to convert the real world into forms which can be uploaded into a computer. What do these forms look like? Being on a computer, they look like 0s and 1s. Or rather, whether each object has the property of this form is a 0 or a 1.
I split the world into “Low-level concepts” and “High-level concepts”. Low-level concepts are characteristics which the computer can easily check an object has, each using a program which returns a 0 or a 1. Examples are whether a region is light or dark, big or small, smooth or textured, inside another region. “High-level concepts” are human recognisable concepts, like a bridge, a won chess position or a catchy song.
The AI can also make up its own concepts – “Mid-level concepts”. Many concepts are related, and you can simplify combinations into single concepts. “Wheel-like” may contain many features which, in various combinations, makes it more likely you have a wheel eg round, has a rubber tyre, has spokes, has metal in it. This packaging of multiple simple concepts into higher level concepts is fundamental to intelligence.
So the question for the computer is: what is the relationship between the things you can measure (low-level concepts), and the concepts you want the computer to understand (high-level concepts)?
Luckily, you can enumerate every possible relationship. This is an exponential job – bigger than that even – but you can list these ideas in order of complexity. This allows you to start at the beginning, finding the easy ideas, and continue to more and more complex ideas. This process is a one-off investment – and easily parallelisable.
The connections between real world data are enumerated and analysed. And the more the training data, the more the truth shines out above the noise in the real world data. It just takes time.
And not as long as you might think. Amongst all the billions of ideas is the best one – the one nearest the perfect form. And this works pretty well. All you have to do is to find it.
Mathematics is an incredibly powerful tool, and Statistics comes in handy here. Statistical significance can tell you which ideas represent a real connection, and which are just a chance combination of inputs.
I love Information Theory. I was lucky enough to meet and discuss it with its brilliant inventor, Claude Shannon, at a computer games competition.
While Statistics tells the AI which of the ideas it conceives are more than just a chance coincidence, Information Theory tells it how much information each idea contains.
The results may shock a casual reader, used to terabyte (1,000,000,000,000 bytes) disks, with each byte equivalent to 8 bits, and each bit enough information to answer the question “did the coin toss come out heads or tails”. Surprisingly, the question “did the die come out a 6” requires, on average, less than 1 bit to answer – about 0.65 bits. This is because you already know the answer is probably “no”, so it takes less than 1 bit to confirm this (on average). If you are looking for a rare event, even the most accurate test can only give you a fraction of a bit. Did my ticket win the national lottery? About a millionth of a bit will tell you!
The AI discards billions of inferior ideas, and keeps just the best. Surprisingly (or maybe unsurprisingly, given the process) these are good enough to recognise complex real world concepts from simple inputs.
Can we look at more and more complex ideas until we have an unlimited intelligence? Yes – and no.
The more ideas we look at, the more subtle the connections we discover. But if we go too far, the limited information in the training set will wear too thin, and the statistical test will show that there isn’t enough data to confirm the connection is real. A bigger training set might confirm the hypothesis – or the putative idea might not stand up to scrutiny.
Once training is complete, the AI can use its chosen rules to categorise the inputs – whether a bridge over a river, a good chess position, or a good stock to invest in. The bigger the training set, the better the outcome. If something is in fact undecidable from the input data, it will tell you that too. And, once learning is complete, these rules are almost instant to make use of.
As with Heisenberg’s uncertainty principle, there is a limit to what the AI can know. But with the ability to accumulate vast knowledge over time and space, this limit is far higher than any one person could attain.
Stephen B Streater
Founder and Director of R&D