Gesture recognition: Leading toward 3D UIs?
Keywords:touchscreen? 3D? gesture recognition?
Limitations of coordinate-based 2D
Designers of computer vision technology have struggled to give computers a human-like intelligence in understanding scenes. If computers don't have the ability to interpret the world around them, humans cannot interact with them in a natural way. Key problems in designing computers that can "understand" scenes include segmentation, object representation, machine learning, and recognition.
Because of the intrinsic limitation of 2D representation of scenes, a gesture recognition system has to apply various cues in order to acquire better results containing more useful information. While the possibilities include whole-body tracking, in spite of combining multiple cues it's difficult to get anything beyond hand-gesture recognition using only 2D representation.
"z" (depth) innovation
The challenge in moving to 3D vision and gesture recognition has been obtaining the third coordinate "z". One of the challenges preventing machines from seeing in 3D has been image analysis technology. Today, there are three popular solutions to the problem of 3D acquisition, each with its own unique abilities and specific uses: stereo vision, structured light pattern, and time of flight (TOF). With the 3D image output from these technologies, gesture recognition technology becomes a reality.
Stereo vision: Probably the best-known 3D acquisition system is a stereo vision system. This system uses two cameras to obtain a left and right stereo image, slightly offset (on the same order as the human eyes are). By comparing the two images, a computer is able to develop a disparity image that relates the displacement of objects in the images. This disparity image, or map, can be either color-coded or gray scale, depending on the needs of the particular system.
Structured light pattern: Structured light patterns can be used for measuring or scanning 3D objects. In this type of system, a structured light pattern is illuminated across an object. This light pattern can be created using a projection of laser light interference or through the use of projected images. Using cameras similar to a stereo vision system allows a structured light pattern system to obtain the 3D coordinates of the object. Single 2D camera systems can also be used to measure the displacement of any single stripe and then the coordinates can be obtained through software analysis. Whichever system is used, these coordinates can then be used to create a digital 3D image of the shape.
Time of flight: Time of flight (TOF) sensors are a relatively new depth information system. TOF systems are a type of light detection and ranging (LIDAR) system and, as such, transmit a light pulse from an emitter to an object. A receiver is able to determine the distance of the measured object by calculating the travel time of the light pulse from the emitter to the object and back to the receiver in a pixel format.
TOF systems are not scanners in that they do not measure point to point. The TOF system takes in the entire scene at once to determine the 3D range image. With the measured coordinates of an object, a 3D image can be created and used in systems such as device control in areas like robotics, manufacturing, medical technologies, and digital photography.
Until recently, the semiconductor devices needed to implement a TOF system were not available. But today's devices enable the processing power, speed, and bandwidth needed to make TOF systems a reality.
3D vision technologies
No single 3D vision technology is right for every application or market. The table compares the different 3D vision technologies and their relative strengths and weaknesses regarding response time, software complexity, cost, and accuracy.
![]() |
Table: 3D vision technology comparison. |
Stereo vision technology demands considerable software complexity for high-precision 3D depth data that can be processed by digital signal processors (DSPs) or multicore scalar processors. Stereo vision systems can be low cost and fit in a small form factor, making them a good choice for devices like mobile phones and other consumer devices. However, stereo vision systems cannot deliver the accuracy and response time that other technologies can, so they're not ideal for systems requiring high accuracy such as manufacturing quality-assurance systems.
Related Articles | Editor's Choice |
Visit Asia Webinars to learn about the latest in technology and get practical design tips.