[Lecture] Object-centric 3D scene understanding from videos

May. 10, 2024

Speaker: Yash Bhalgat, University of Oxford

Time: 12:00 a.m., May 10, 2024, GMT+8

Venue: Room 101, Courtyard No.5, Jingyuan & Online Talk

Abstract:

The growing demand for immersive, interactive experiences has underscored the importance of 3D data in understanding our surroundings. Traditional methods for capturing 3D data are often complex and equipment-intensive. In contrast, my research aims to utilize unconstrained videos, such as those from augmented reality glasses, to effortlessly capture scenes and objects in their full 3D complexity. As a first step, I will describe a method to incorporate Epipolar Geometry priors in multi-view Transformer models to enable identifying objects across extreme pose variations. Next, I will discuss my recent work on 3D object segmentation using 2D pre-trained foundation models, following which I will touch upon my ongoing work on Language+3D.

Source: Center on Frontiers of Computing Studies, PKU