19 Autoregressive Models, Variational Autoencoders
About 952 wordsAbout 3 min
computer-vision
2025-01-14
Taxonomy of generative models, Autoregressive models, Variational Inference, ELBO Optimization, Variational autoencoders, VQ-VAE
@Credits: EECS 498.007 | Video Lecture: UM-CV
Personal work for the assignments of the course: github repo.
Notice on Usage and Attribution
These are personal class notes based on the University of Michigan EECS 498.008 / 598.008 course. They are intended solely for personal learning and academic discussion, with no commercial use.
For detailed information, please refer to the complete notice at the end of this document
Intro
- Supervised Learning vs Unsupervised learning, density estimation, etc. structure of the data. E.g. Clustering,
- Supervised learning: Data (x,y) - learn a function to map x↦y. E.g. classification, regression, detection, segmentation, captioning dimensionality reduction, feature
- Unsupervised learning: Data x - no labels - lean some underlying hidden
- Usage: Assign labels to data, feature learning (with labels)
- Discriminative vs Generative models
- Discriminative model: learn a probability distribution p(y∣x) - supervised
- "competition" between labels conditioned on input images
- Requires deep image understanding!
- All possible images compete with each other for probability mass
- Usage: Detect outliers, feature learning (Unsupervised), sample to generate new data
- Generative model: learn a probability distribution p(x)
- All possible images compete with each other for probability mass.
- Model can "reject" unreasonable inputs by assigning them small estimated probabilities.
- Conditional Generative Model: learn p(x∣y) - mostly supervised
- Conditioned on labels, all possible images compete.

Fig: Recall: Bayes' Rule
Taxonomy of Generative Models
Figure adapted from Ian Goodfellow, tutorial on Generative Adversarial Net

Fig: Taxonomy of Generative Models
Our course: Autoregressive/VAE/GANs
Autoregressive Models
Explicit Density Estimation
Goal: Write down an explicit function for p(x)=f(x,W)
Given dataset x(1),x(2),...,x(N), train the model by solving:
W∗=argWmaxi∏p(x(i))=argWmaxi∑logp(x(i))=argWmaxi∑logf(x(i),W)
Assume x consists of multiple subparts: x=(x1,x2,...,xT)
Break down probability using the chain rule: p(x)=p(x1,...,xT)=∏t=1Tp(xt∣x1,...,xt−1)
This is similar to recurrent neural networks!
PixelRNN
Generate image pixels one at a time, starting at the upper left corner
ICML 2016, Pixel Recurrent Neural Networks
Problem: super slow, not parallelizable
PixelCNN

Fig: PixelCNN
Autoencoders
Regular Autoencoders


Fig: Encoders
Problem: Not probabilistic - No way to sample new data from learned model
Variational Autoencoders
- Learn latent features z from raw data
- Sample from the model to generate new data

Fig: Variational Autoencoders
Train the model? Maximize likelihood of data.

Fig: Encoders and decoders

Fig: Training autoencoders
Training with ELBO

Fig: Fully-Connected VAE

Fig: Fully-Connected VAE
Generating data

Generating data
Disentangling factors of variation

Generating data

Editing latent space

Summary
VQ-VAE
NeurIPS 2019
VAE model to generate multi-scale grids of latent codes
PixelCNN in latent space

Summary
Notice on Usage and Attribution
This note is based on the University of Michigan's publicly available course EECS 498.008 / 598.008 and is intended solely for personal learning and academic discussion, with no commercial use.
- Nature of the Notes: These notes include extensive references and citations from course materials to ensure clarity and completeness. However, they are presented as personal interpretations and summaries, not as substitutes for the original course content.
- Original Course Resources: Please refer to the official University of Michigan website for complete and accurate course materials.
- Third-Party Open Access Content: This note may reference Open Access (OA) papers or resources cited within the course materials. These materials are used under their original Open Access licenses (e.g., CC BY, CC BY-SA).
- Proper Attribution: Every referenced OA resource is appropriately cited, including the author, publication title, source link, and license type.
- Copyright Notice: All rights to third-party content remain with their respective authors or publishers.
- Content Removal: If you believe any content infringes on your copyright, please contact me, and I will promptly remove the content in question.
Thanks to the University of Michigan and the contributors to the course for their openness and dedication to accessible education.