An object recognition framework using contextual interactions among objects
Author(s)
Advisor
Aksoy, SelimDate
2009Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
191
views
views
59
downloads
downloads
Abstract
Object recognition is one of the fundamental tasks in computer vision. The
main endeavor in object recognition research is to devise techniques that make
computers understand what they see as precise as human beings. The state of
the art recognition methods utilize low-level image features (color, texture, etc.),
interest points/regions, filter responses, etc. to find and identify objects in the
scene. Although these work well for specific object classes, the results are not
satisfactory enough to accept these techniques as universal solutions. Thus, the
current trend is to make use of the context embedded in the scene. Context
defines the rules for object - object and object - scene interactions. A scene
configuration generated by some object recognizers can sometimes be inconsistent
with the scene context. For example, observing a car in a kitchen is not likely in
terms of the kitchen context. In this case, knowledge of kitchen can be used to
correct this inconsistent recognition.
Motivated by the benefits of contextual information, we introduce an object
recognition framework that utilizes contextual interactions between individually
detected objects to improve the overall recognition performance. Our first contribution
arises in the object detector design. We define three methods for object
detection. Two of these methods, shape based and pixel classification based object
detection, mainly use the techniques presented in the literature. However,
we also describe another method called surface orientation based object detection.
The goal of this novel detection technique is to find objects whose shape,
color and texture features are not discriminative while their surface orientations
(horizontality or verticality) are consistent across different instances. Wall, table
top, and road are typical examples for such objects. The second contribution
is a probabilistic contextual interaction model for objects based on their spatial
relationships. In order to represent the spatial relationships between objects, we propose three features that encode the relative position/location, scale and
orientation of a given object pair. Using these features and our object interaction
likelihood model, we achieve to encode the semantic, spatial, and pose
context of a scene concurrently. Our third main contribution is a contextual
agreement maximization framework that assigns final labels to the detected objects
by maximizing a scene probability function that is defined jointly using both
the individual object labels and their pairwise contextual interactions. The most
consistent scene configuration is obtained by solving the maximization problem
using linear optimization.
We performed experiments on the LabelMe [27] and Bilkent data sets by
both utilizing and not utilizing the scene type (indoor or outdoor) information.
While the average F2 score increased from 0.09 to 0.20 without the scene type
assumption, it increased from 0.17 to 0.25 when the scene type is known on
the LabelMe dataset. The results are similar for the experiments performed
on the Bilkent data set. F2 score increased from 0.16 to 0.36 when the scene
type information is not available and it increased from 0.31 to 0.44 when this
additional information is used. It is clear that the incorporation of the contextual
interactions improves the overall recognition performance.