Naming faces on the web
Şahin, Pınar Duygulu
Please cite this item using this persistent URLhttp://hdl.handle.net/11693/15144
In this study, we introduce a method to name less-frequently appearing people on the web via naming frequently appearing ones first. Current image search engines are widely used for querying a person, however; retrievals are based on textual content; therefore, the results are not satisfactory. On the other hand, although; face recognition is a long standing problem; it is tested for limited sizes and successful results are acquired just for face images captured under controlled environments. Faces on the web, contrarily are huge in amount and vary in pose, illumination, occlusion and facial attributes. Recent researches on the area, suggest not to use simply the visual or textual content alone, but to combine them both. With this approach, face recognition problem is simplified to a face-name association problem. Following these approaches, in our method textual and visual information is combined to name faces. We divide the problem into two sub problems, first the more frequently appearing faces, then the less-frequently appearing faces on the web images are named. A supervised algorithm is used for naming a specified number of categories belonging to more frequently appearing faces. The faces that are not matched with any category are then considered to be the less-frequently appearing faces and labeled using the textual content. We extracted all the names from textual contents, and then eliminate the ones used to label frequentlyappearing faces before. The remaining names are the candidate categories for lessfrequently appearing faces. Each detected less-frequently appearing face finally matched to the names extracted from their corresponding textual content. In order to prune the irrelevant face images, finally, the most similar faces among this collection are found to be matched with their corresponding category. In our experiments, the method is applied on two different datasets. Bothdatasets are constructed from the images captured in realistic environments, varying in pose, illumination, facial expressions, occlusions and etc. The results of the experiments proved that the combination of textual and visual contents on realistic face images outperforms the methods that use either one of them. Besides, handling the face recognition problem as a face-name association, improves the results for the face images collected from uncontrolled environments.