We, Robot: A Vision of What’s to Come

December 18th, 2009 by

Packed in like sardines by Don Solo

When I was growing up and spending too much time watching cartoons, I met George Jetson, and Jane (his wife). When I got to know Rosie, the Jetson’s robot maid (whom they loved and would never replace or upgrade) I became very hopeful for a future free of household chores.

Alas, at the time of writing, I am still without a robotic aide. Although the use of robotics in manufacturing processes and other fields is growing, artificial intelligence has not nearly progressed to the level many science fiction writers predicted. One of the key sticking points is robot vision.

While naturally the details of robot vision technologies are complicated, the bare bolts are roughly analogous to the way we ourselves see – with an eye and a brain. First, you need a camera or sensor to take in images (eye) and a unit to process the information, generally by analyzing every pixel (brain). An added bonus is the ability to act on the information, which is necessary for some tasks, especially in a future when you can’t be bothered making your own coffee.

Anyone who can take 12 megapixel photos with their phone will know that current image-capturing technology is extremely sophisticated. The real trick is in interpreting the image correctly. It is this image-processing part of the equation where we start to see difficulties and the huge gulf between human and robot vision.

Humans – with certain relations of mine excepted – effortlessly filter out non-pertinent information. For example, you needn’t re-analyze every last detail in your visual field (consciously noting the color of the curtains, for example) in order to realize that your wife changed the TV channel while you were in the bathroom. You just notice that the room is pretty much the same but for the absence of Top Gear and the appearance of Project Runway. A robot, on the other hand, may well have to process every last detail in the scene before kicking up a fuss.

Limited vision doesn’t inhibit certain robot applications. Robots are adept at performing certain specialized tasks on a manufacturing production line, or sorting out a jumble of spare parts where the robot has been preprogrammed to recognize all the various shapes. But put one in my idealized scenario where you want it to fetch you a couple of kiwifruit from the fridge, and it might well return with a couple of broken eggs instead.

The wide disparity between natural and artificial vision is no real surprise of course. Human eyesight has the benefit of millions of years of natural selection’s tiny, incremental improvements. Robot or computer vision has only a couple of lousy decades on the board which, even with a multitude of talented researchers on the case, is hardly enough time to catch up.

But catching up it is. Researchers are constantly developing algorithms to enable robots to better filter out relevant information from their images. A clever way of addressing this problem has recently been crafted by harnessing the power of crowdsourcing. Through tools such as the Mechanical Turk, people all over the world are in a position to help the scientists with the rather massive task of sifting through and sorting raw image data. People can assist researchers for instance by labeling objects, or drawing around the outlines of pertinent information.

This kind of crowd assistance can help researchers to develop software that allows computers/robots to process visual information more quickly and efficiently. The spoilt among us, who resentfully feel that science fiction has created expectations that reality has been slow to deliver upon, may just want to sign up to Mechanical Turk and get the ball rolling a little quicker.