The Foundations of Data Science and AI Community of Practice at TDAI is pleased to offer a Deep Learning Summer School featuring tutorials and research presentations by community members in partnership with the AI-EDGE Institute.
June 1, 9:30 am - 4:30 pm
June 2, 9:30 am - 4:30 pm
June 3, 10:00 am - 12:15 pm
Location: 320 Pomerene Hall or Zoom
Zoom recordings of the Deep Learning Summer School are available using the following links. Note that these are un-edited recordings; edited versions will be available as we are able to process them. Use the backward and forward buttons on the Day 1 and Day 2 recordings to navigate between sessions.
Wednesday, June 1
9:30 - 9:40am
Opening and welcome
9:40am - 12:00pm
Tutorial 1: Overview of Neural Networks and Deep Learning (Eric Fosler-Lussier, CSE, TDAI Leadership Team)
This tutorial will start from very basic neural networks and discuss how they are trained. Standard building blocks of building larger networks will be discussed. The latter half of the tutorial presents a model zoo of different kinds of neural networks in common usage today.
2:00 - 4:30pm
Tutorial 2: Optimization for Deep Learning (Jia (Kevin) Liu, ECE)
This tutorial will introduce optimization algorithm design and theory for solving modern machine learning problems. The goal of this tutorial is to familiarize students with the theoretical and mathematical foundation at the intersection of optimization and machine learning so that they will be able to use optimization to solve deep learning problems and conduct research in related fields. Topics include various first-order methods starting from basic stochastic gradient descent to more advanced methods (e.g., variance reduction, adaptive methods), as well as first-order optimization methods for overparameterized models with special geometric structures.
Thursday, June 2
9:30am - 12:00pm
Tutorial 3: Applications in Natural Language Processing (Huan Sun, CSE, TDAI Core Faculty; and Zhen Wang, CSE)
This tutorial will introduce some common Natural Language Processing (NLP) tasks and powerful deep learning models for them. We will also discuss the dominant pre-trained language models such as BERT and GPT-3 that have received a lot of attention in recent years. Finally, we will give a step-by-step demo showing how to build deep learning models for NLP.
2:00 - 4:30pm
Tutorial 4: Applications in Computer Vision (Wei-Lun (Harry) Chao, CSE)
This tutorial will start with an overview of popular applications in computer vision and how deep learning has played an indispensable role in them. We will then introduce widely used models and algorithms, such as ResNet for classification, Mask R-CNN for detection and segmentation, and GANs for image generation. Finally, we will present an important application -- perception for autonomous driving -- and introduce some key computer vision techniques in it.
Friday, June 3
10:00 - 10:30am
A New Paradigm for Conversational AI (Yu Su, CSE, TDAI Affiliate)
Conversational AI is currently at a tipping point. On the one hand, there is growing consensus on the promise of conversational AI as a universal interface that seamlessly integrates a wide range of backend data and services and builds trust with users via informative and engaging interaction. On the other hand, research on conversational AI has yet met the high hopes, and mainstream conversational AI agents still present significant limitations in expressiveness, contextuality, and trustworthiness, among others. In this talk, I will discuss a new promising paradigm for conversational AI that models dialogues as dataflow graphs. The substantially improved expressiveness makes it natural to represent highly contextualized multi-turn, cross-domain dialogues that are difficult, if possible, to represent with existing dialogue representations. This is the technical backbone of the new conversational interface for Microsoft Outlook.
10:30 - 11:00am
On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models (Peizhong Ju, ECE, postdoc)
We study the generalization performance of min L2-norm overfitting solutions for the neural tangent kernel (NTK) model of a two-layer neural network with ReLU activation that has no bias term. We show that, depending on the ground-truth function, the test error of overfitted NTK models exhibits characteristics that are different from the "double-descent" of other overparameterized linear models with simple Fourier or Gaussian features. Specifically, for a class of learnable functions, we provide a new upper bound of the generalization error that approaches a small limiting value, even when the number of neurons p approaches infinity. This limiting value further decreases with the number of training samples n. For functions outside of this class, we provide a lower bound on the generalization error that does not diminish to zero even when n and p are both large.
11:00 - 11:15am
11:15am - 11:45am
Stochastic composition optimization in the absence of Lipschitz continuous gradient (Sam Davanloo, ISE, TDAI Core Faculty)
We consider the optimization of nested composition of two functions where at least the inner function involves an expectation. In such nested structures, obtaining unbiased estimates of the gradient of the composition is complicated. Stochastic composition optimization has gained popularity mainly due to applications in reinforcement learning and meta learning. In the absence of the Lipschitz continuity of the gradient of the inner and/or outer functions, we develop stochastic algorithms to optimize the composition function. The sample complexity of the proposed algorithms to obtain first-order stationary points are investigated and will be presented. (Joint work with Yin Liu)
11:45am - 12:15pm
Model-Agnostic Meta-Learning from Optimization Perspective (Yingbin Liang, ECE, TDAI Core Faculty)
Meta-learning or learning to learn has been shown to be a powerful tool for fast learning over unseen tasks by efficiently extracting the knowledge from a range of observed tasks. Such empirical success thus highly motivates the understanding of the performance guarantee of meta-learning, which will serve to guide the better design of meta-learning and further expand its applicability. In this talk, I will present our recent studies of meta-learning from an optimization perspective. I will focus on a popular meta-learning approach, the model-agnostic meta-learning (MAML) type of algorithms, and first present the convergence guarantee and the computational complexity that we establish for the vanilla MAML algorithm. I will then talk about our result on a more scalable variant of MAML, called the almost no inner loop (ANIL) algorithm. We characterize the performance improvement of ANIL over MAML as well as the impact of the loss function landscape on the overall computational complexity. I will finally present the experimental validations of our theoretical findings and discuss a few future directions on the topic.