Speaker: Yash Bhalgat, University of Oxford
Time: 12:00 a.m., May 10, 2024, GMT+8
Venue: Room 101, Courtyard No.5, Jingyuan & Online Talk
Abstract:
The growing demand for immersive, interactive experiences has underscored the importance of 3D data in understanding our surroundings. Traditional methods for capturing 3D data are often complex and equipment-intensive. In contrast, my research aims to utilize unconstrained videos, such as those from augmented reality glasses, to effortlessly capture scenes and objects in their full 3D complexity. As a first step, I will describe a method to incorporate Epipolar Geometry priors in multi-view Transformer models to enable identifying objects across extreme pose variations. Next, I will discuss my recent work on 3D object segmentation using 2D pre-trained foundation models, following which I will touch upon my ongoing work on Language+3D.
Source: Center on Frontiers of Computing Studies, PKU