I watched this TED video lately – Kenneth Cukier did an incredibly interesting talk about big data.
I have all the respect for Mr. Cukier, but I’ve noticed something rather interesting in his video: when he talks about big data, he refers to numbers and machines learning numeric data. What about data in the format of text? Let me start explaining this using one of his examples of self-driving cars.
Information = Numbers?
A self-driving car knows that if the traffic light is green, it is free to go – it uses numeric data to learn this. Let me explain it somewhat broadly. In a programming language, it’s fairly easy to define an algorithm that says green=go, not green=stop. You can even write it up in an Excel sheet:
If(Traffic Light=green; [go if true] [stop if false].
(I know it’s a rather simplified example, but stay with me, I’ll get to the point.)
When the car’s sensor detects the traffic light, it needs to recognise the colour green. Colours can be identified with numbers – each color has a numeric reference. The HEX code of green, for example, is #008000. If we stick this into the Excel formula, we get the following:
If(Traffic Light=#008000; [go if true] [stop if false].
See? It wasn’t too difficult – we’ve just created this command using numbers. (You’d still have to turn the start and go actions, traffic lights and other objects into numbers, but taking it to a higher level, it’s possible too.)
Numbers do not involve every information
But how do you go about processing written data in the form of text? Using just numbers, how would you figure out what each email is about your complaints team received and decide which one of your products or services requires SOS improvements? How do you filter inappropriate content while your child is browsing the net?
These are just a few examples of big data problems that can’t be solved with numbers. You need to understand what’s written before you can do anything, which is not possible with the traditional technologies processing big data. We need to make all this possible. And that’s exactly what we are working on at Slamby. Numbers does not involve every information, it is just easier to calculate with them, to deal with them.
Mr Cukier finishes his talk with the following statement: „Humanity can learn from the information that it can collect.” At Slamby we believe that we can learn from numeric data, but also that we can just as easily learn from big data in other shapes and forms.