One Eye on the World
A group of Stanford scientists develops a new algorithm for robot vision
by Wenqi Shao
Imagine future expressways in the sky and on the ground whizzing with robots. You'll only find this in science fiction, as robots today are too clumsy to maneuver around obstacles at high speeds because they have trouble judging depth. A group of Stanford computer scientists led by Professor Andrew Ng, however, could make this a reality. The team has developed a novel algorithm to improve vision processing by robots.
The Vision Algorithm
 |
| The automatic robot car used to test the monocular vision algorithm. |
The imaging algorithm developed by graduate students Ashutosh Saxena and Andrew Sung H. Chung and Professor Ng improves upon traditional algorithms by combining the concepts of monocular vision - seeing with a single eye and prior knowledge a process of supervised learning present in humans.
The robot's "eye" is a single camera that captures a set of images from the surrounding environment. The depth from the camera to each pixel is recorded in a database called a depthmap. Cues such as texture variations, edges, object size, and haze are used to determine the depths at individual points and the relation between depths at different points.
 |
| Depthmap Results for a varied set of environments showing original image (column 1), actual depthmaps (column 2), and depthmaps predicted by models (column 3). |
Unlike traditional algorithms, the novel algorithm relies heavily on stored knowledge from previously encountered images. Once captured, the image is divided into smaller sections called patches. The depth of each patch is analyzed individually and in a global-image context. Each image patch uses information from its four neighbors at three different size scales and from its respective location in the image. The algorithm deduces the image patches in the following manner: more detailed surfaces are closer; merging edges indicate further distances; smaller objects are farther away; and haze is used to indicate greater distance. Through this process, the features on the image are used to determine 3-D depths.
Testing for Robot Vision
In an initial study, Saxena, Chung, and Ng created a depthmap database using a 3-D laser scanner to collect 425 images from a variety of environments including campus areas, forests, and indoor places. This database enabled the robot to learn to judge distances as it captured new images.
In a study done by Ng's team, robots were able to judge distances in both indoor and outdoor locations with an average error of 35%, meaning that a robot could determine the distance of an object 100 feet away as if it was between 65 and 135 feet away. The highest degree of depth error occurred in images dominated by irregular leaves and branches. However, even human performance and judgment on these images would probably be poor. The level of accuracy demonstrated by the study is sufficient for a robot refreshing its viewed images at ten frames per second and moving at 20 mph to adjust its path and avoid obstacles.
The monocular vision algorithm was implemented in an automatic-robot car, measuring 2 feet by 2.5 feet by 1 foot, driving at 11 mph. At the Stanford sculpture garden, a high-density obstacle environment filled with sculptures, trees, bushes, and rocks, the robot vehicle was able to self-navigate for up to one minute before crashing. On a terrain with fewer obstacles, such as a parking lot with trees, the robot was able to navigate with only camera input for approximately two to three minutes.
 |
| A depthmap for an image path, which includes features from its immediate neighbors, its more distant neighbors (at larger scales), and its corresponding column. |
Seeing the Future
While initial trials have demonstrated the success of the monocular vision algorithm, a remaining challenge is to reduce the requirement of the extensive prior knowledge of the surroundings. The robot's operational time in a random outdoor environment, without prior knowledge, is approximately five seconds. Thus, although the robot would perform fairly well in Palo Alto or another familiar setting, it would perform poorly if placed in an unfamiliar environment such as the surface of Mars. Ideally, images from the internet or outside sources could be downloaded to the robot to enhance its prior knowledge.
The monocular vision algorithm is just the beginning of exciting developments in visual processing. Saxena, Chung, and Ng hope to generalize the machine vision algorithm so that it can be applied in other instruments and procedures beyond driving a robot-controlled car. Their work provides a glimpse of what the future may hold for artificial vision.
|
|