How about a short computer vision lesson?
Take a look at the image below. Do you notice anything interesting?
An interesting question here is how to tell where the back side of the cup is relatively to the front. You may wonder what kind of people is interested in such questions. You! Every time you pick up a cup you need to know where the back is. Unless the material is transparent, you can’t see the holding place. So how can we do it? I discussed this with several hypothetical experts.
Database expert: Use barcode on the cup and store the size in a database.
- Assuming you have a full time DBA and a standards committee.
Physicist: Use a special wavelength and measure the scattering differences between the front and back sides, which should correlate with the size.
- Don’t try this at home.
Machine learning expert: Learn from examples. Collect enough images of glasses of known sizes, and use your favorite neural net / Boltzmann machine / support vector machine / belief propagation / Markov random field / singular value decomposition / decision trees. If nothing works try Gaussians with Bayes rule.
- Don’t forget to include Chinese cups in the training set.
Robotics expert: Just close the fingers until they touch the surface.
- Simple enough, but there are studies that filmed people grasping objects of different sizes, showing that the opening between the fingers already during the grasping motion are proportional to the final size, as if the brain executes a grasping plan where the size is estimated at the beginning.
The top of the cup is easier to detect, since the contrast is higher. If the top is a circle, its projection in the image will be seen as ellipse. Let’s assume we can detect the ellipse, and make the assumption it originated from a planar circle. There are two possible 3D orientations of the circle that would project to the ellipse, but only one is left if we can tell the front and back sides of the cup. Another assumption we need is that the surface of the cup is orthogonal to the circle’s plane. Of course, it doesn’t have to be so, but in man-made cups it usually is because it is easier for humans (who make such assumptions) to detect and hold. So if we know the plane of the top circle, by following the orthogonal lines we can estimate where the back side is. Altogether we had here some assumptions that may be invalid, a mapping from ellipse to the plane of the circle in 3D, and propagation of information from the top to the invisible bottom - a piece of cake for the human brain.
You may have already figured this by yourself. So here comes the lesson I promised. Basically, we can do without computer vision. System engineers don’t care if they use laser range finders, barcodes, or ask humans to do the intelligent work. But these are not solutions, only ways to bypass the real issues. And these cheap workarounds made us blind to things which are everyday in front of our eyes. Cheers,