Please Enter Keywords
资源 63
[Lecture] Minimax optimality and generalization mechanism of foundation models
Apr. 08, 2024

from clipboard

Speaker: Taiji Suzuki (The University of Tokyo)


Time: 14:00-15:00 p.m., April 8, 2024, GMT+8

Venue: Zoom ID: 830 7673 7435 Passcode: 145696

Abstract: 

In this presentation, I will discuss the learning ability of foundation models such as diffusion models and Transformers from a nonparametric estimation perspective. In the first half, I will present the estimation ability of diffusion models as a distribution estimator. We show that the empirical score matching estimator obtained in the class of deep neural networks achieves the nearly minimax optimal rates in terms of both the total variation distance and the Wasserstein distance, assuming the true density function belongs to the Besov space. Furthermore, we also consider a situation where the support of density lies in a low-dimensional subspace, and then show that the estimator is adaptive to the low dimensionality and achieves the minimax optimal rate corresponding to the intrinsic dimensionality. In the latter half, I will present a nonparametric convergence analysis of transformer networks in a sequence-to-sequence problem. Transformer networks are the fundamental model for recent large language models. They can handle long input sequences and avoid the curse of dimensionality with variable input dimensions. We show that they can adapt to the smoothness property of the true function, even when the smoothness towards each coordinate depends on each different input.

Source: School of Mathematical Sciences, PKU