Browsing by Subject "Scene understanding"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Open Access Multi-label multi-modal classification of movie scenes(2022-09) Türköz, IrmakPromoting movies through their trailers provides valuable information that can help viewers and investors form expectations about the movie’s future success. Recent research confirmed that the audience prefers to watch movies through at-home-streaming services rather than at the theaters which resulted in movie trailers being shown privately. Moreover, advertisements created for different interest groups can provide a drastically improved experience for users and advertisers alike. There have been few attempts to automatize the trailer generation process however, AI-generated trailers were considered less attractive than editors’ cre-ations. Fortunately, the use of the most recent advancements in deep learning and the greater availability of datasets can accelerate the automated trailer generation process. Every movie produced is labeled with a set of genres that it represents. Thus, it is possible to generate multiple trailers of the same movie for different genres to offer personalized advertisements to the audience. To the best of our knowledge, personalized advertisements of movies via genre-specific trailers will be the first attempt in the automated trailer generation studies. For this task, we needed a tool that extracts representative scenes of a particular genre from a given movie. Then, these scenes can be concatenated to form a draft of a trailer for each genre. The draft can be finalized through the creative post-production process. In this thesis, we developed a deep learning network that classifies scenes into a set of genres. In order to construct a training dataset to train this network, we compiled a set of scenes that are labeled with their representative genres. Our network accomplishes a multi-label classification task with hyper-parameters learned from experimental binary models. The learning process comprises the use of visual features, audio features, and their combination. The final result of the model is evaluated by comparing its classification performance with human perception.Item Open Access Structural scene analysis of remotely sensed images using graph mining(2010) Özdemir, BahadırThe need for intelligent systems capable of automatic content extraction and classi cation in remote sensing image datasets, has been constantly increasing due to the advances in the satellite technology and the availability of detailed images with a wide coverage of the Earth. Increasing details in very high spatial resolution images obtained from new generation sensors have enabled new applications but also introduced new challenges for object recognition. Contextual information about the image structures has the potential of improving individual object detection. Therefore, identifying the image regions which are intrinsically heterogeneous is an alternative way for high-level understanding of the image content. These regions, also known as compound structures, are comprised of primitive objects of many diverse types. Popular representations such as the bag-of-words model use primitive object parts extracted using local operators but cannot capture their structure because of the lack of spatial information. Hence, the detection of compound structures necessitates new image representations that involve joint modeling of spectral, spatial and structural information. We propose an image representation that combines the representational power of graphs with the e ciency of the bag-of-words representation. The proposed method has three parts. In the rst part, every image in the dataset is transformed into a graph structure using the local image features and their spatial relationships. The transformation method rst detects the local patches of interest using maximally stable extremal regions obtained by gray level thresholding. Next, these patches are quantized to form a codebook of local information and a graph is constructed for each image by representing the patches as the graph nodes and connecting them with edges obtained using Voronoi tessellations. Transforming images to graphs provides an abstraction level and the remaining operations for the classi cation are made on graphs. The second part of the proposed method is a graph mining algorithm which nds a set of most important subgraphs for the classi cation of image graphs. The graph mining algorithm we propose rst nds the frequent subgraphs for each class, then selects the most discriminative ones by quantifying the correlations between the subgraphs and the classes in terms of the within-class occurrence distributions of the subgraphs; and nally reduces the set size by selecting the most representative ones by considering the redundancy between the subgraphs. After mining the set of subgraphs, each image graph is represented by a histogram vector of this set where each component in the histogram stores the number of occurrences of a particular subgraph in the image. The subgraph histogram representation enables classifying the image graphs using statistical classi ers. The last part of the method involves model learning from labeled data. We use support vector machines (SVM) for classifying images into semantic scene types. In addition, the themes distributed among the images are discovered using the latent Dirichlet allocation (LDA) model trained on the same data. By this way, the images which have heterogeneous content from di erent scene types can be represented in terms of a theme distribution vector. This representation enables further classi cation of images by theme analysis. The experiments using an Ikonos image of Antalya show the e ectiveness of the proposed representation in classi cation of complex scene types. The SVM model achieved a promising classi cation accuracy on the images cut from the Antalya image for the eight high-level semantic classes. Furthermore, the LDA model discovered interesting themes in the whole satellite image.