Spatial reasoning via deep vision models for robotic sequential manipulation

buir.contributor.authorÖğüz, Salih Özgür
dc.citation.epage11335en_US
dc.citation.spage11328
dc.contributor.authorZhou, H.
dc.contributor.authorSchubert, I.
dc.contributor.authorToussaint, M.
dc.contributor.authorÖğüz, Salih Özgür
dc.coverage.spatialDetroit, USA
dc.date.accessioned2024-03-15T11:09:40Z
dc.date.available2024-03-15T11:09:40Z
dc.date.issued2023-10-01
dc.departmentDepartment of Computer Engineering
dc.descriptionConference Name: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023
dc.descriptionDate of Conference: 1 October 2023through 5
dc.description.abstractIn this paper, we propose using deep neural architectures (i.e., vision transformers and ResNet) as heuristics for sequential decision-making in robotic manipulation problems. This formulation enables predicting the subset of objects that are relevant for completing a task. Such problems are often addressed by task and motion planning (TAMP) formulations combining symbolic reasoning and continuous motion planning. In essence, the action-object relationships are resolved for discrete, symbolic decisions that are used to solve manipulation motions (e.g., via nonlinear trajectory optimization). However, solving long-horizon tasks requires consideration of all possible action-object combinations which limits the scalability of TAMP approaches. To overcome this combinatorial complexity, we introduce a visual perception module integrated with a TAMP-solver. Given a task and an initial image of the scene, the learned model outputs the relevancy of objects to accomplish the task. By incorporating the predictions of the model into a TAMP formulation as a heuristic, the size of the search space is significantly reduced. Results show that our framework finds feasible solutions more efficiently when compared to a state-of-the-art TAMP solver.
dc.description.provenanceMade available in DSpace on 2024-03-15T11:09:40Z (GMT). No. of bitstreams: 1 Spatial_reasoning_via_deep_vision_models_for_robotic_sequential_manipulation.pdf: 341536 bytes, checksum: f238442dbc1dfe4c5b44207fbbeffac8 (MD5) Previous issue date: 2023-10-01en
dc.identifier.doi10.1109/IROS55552.2023.10342010en_US
dc.identifier.eissn2153-0866en_US
dc.identifier.issn2153-0858en_US
dc.identifier.urihttps://hdl.handle.net/11693/114794en_US
dc.language.isoEnglishen_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.relation.isversionofhttps://dx.doi.org/10.1109/IROS55552.2023.10342010
dc.source.titleIEEE International Conference on Intelligent Robots and Systems
dc.subjectComputer vision
dc.subjectDecision making
dc.subjectDeep learning
dc.subjectIntelligent robots
dc.titleSpatial reasoning via deep vision models for robotic sequential manipulation
dc.typeConference Paper

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Spatial_reasoning_via_deep_vision_models_for_robotic_sequential_manipulation.pdf
Size:
330.64 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.01 KB
Format:
Item-specific license agreed upon to submission
Description: