CSC 2523: Visual Recognition with Text
Graduate seminar course
Images do not appear in isolation. For example, on the web images are typically surrounded by informative text in the form of tags (e.g., on Flickr), captions (short summaries conveying something about the picture), and blogs/news articles, etc. In robotics, language is the most convenient way to teach an autonomous agent novel concepts or to communicate the mistakes it is making. For example, when providing a novel task to a robot, such as "pass me the stapler", we could provide additional information, e.g., "it is next to the beer bottle on the table". This information could be used to greatly simplify the parsing task. Conversely, it also crucial that the agent communicates its understanding of the scene to the human, e.g., "I can't, I am watching tv on a sofa, next to the wine bottle."