Controllable diffusion-based visual editing

Date

2026-02

Editor(s)

Advisor

Boral, Ayşegül Dündar

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats
2
views
8
downloads

Series

Abstract

Advancements in generative networks have significantly improved visual generation, particularly for image and video editing applications. However, key challenges remain in achieving controllable editing. Diffusion inpainting models often hallucinate or re-insert the intended object during object removal, and text-tovideo diffusion models struggle to follow a desired motion pattern without sacrificing prompt alignment for motion conditioned generation. This thesis addresses these gaps through two interconnected studies. First, we introduce a backgroundfocused image conditioning framework for object removal that utilizes focused embeddings and proposes a suppression method for removing foreground concept in the conditioning signal. By explicitly using such conditioning, it prevents common failure modes such as foreground leakage and mask-shape-driven hallucinations. Second, we develop a motion-conditioned video generation and editing method that achieves successful motion transfer from a reference to the generated video. By directly updating the positional embeddings, it achieves high fidelity motion aligned generation without sacrificing the textual condition alignment. Together, these contributions advance controllable visual editing by demonstrating that pretrained generative models contain useful behaviors beyond their explicit training objectives, and that providing the right guidance can unlock robust control with improved fidelity, consistency, and user-directed precision.

Source Title

Publisher

Course

Other identifiers

Book Title

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Citation

Published Version (Please cite this version)

Language

English

Type