Generative Diffusion Models
Contents
import numpy as np
import pylab as plt
Generative Diffusion Models#
Our goal in this assignment is to understand generative models like stable diffusion. In stable diffusion, you type a string and it generates a picture of that string. For this assignment, we are going to get to 70% of that understanding.
Diffusion#
We are going to start by thinking about diffusion. Consider a pollen particle which is being buffeted by air. If we weren’t paying attention to the air, but instead just the pollen particle, what we would expect to see is that the pollen particle would ocassionally jump in random directions as the air is hitting it. We might model this by saying that at time
where
Notice that this is a Markov chain: the new position
Because it is a Markov chain it has to have a stationary distribution. It turns out (in the limit of small-ish
Simulate the Langevin equation and check its probability distribution
Make another Markov chain that we know has the right probabilty distribution and show that it is equivalent to Langevin dynamics
Demonstrate that the Langevin equation has the right dynamics
Approach 1#
We will start with the first approach. Write a function
def ForwardDiffusion(x_0,T, delta):
# do stuff
return locations
which takes an initial condition x_0
and an integer number of steps
Grading
After you’ve written this function, go ahead and run it out to
Approach 2#
We can alternatively start with a Markov chain that we know the stationary distribution to - i.e. the Metropolis Markov chain.
In this section we are trying to reason about why the forward diffusion equation generated a probability distribution
One Markov chain we do understand is the Metropolis method. In the Metropolis method, we know that we should choose a move from
Let’s start with moving with simply a guassian with standard deviation
Then we should accept with probability
This will give us a Markov chain which samples the probability distribution
Grading
Nonetheless, go ahead and implement this markov chain using the quadratic energy
Now, we are going to go ahead and try to massage our Metropolis MCMC into something that looks more like our target dynamics. Let us now try a new sample probability using
What this probability distribution tells us is that we should first drift our particle from
Using Metropolis, let’s work out the acceptance probability
We will find that the acceptance ratio
Grading
Go ahead and show that the acceptance ratio is 1 by running this in a Monte Carlo and computing the acceptance ratio.
In addition, you can check this by choosing a series of
Approach 3#
The two previous approaches were essentially showing that the Langevin equation gives us the stationary distribution we want numerically. We can also show this analytically.
Starting from a generalized version of the Langevin eqution,
You can show that this satisfies the diffusion equation,
A nice exposition of this is in James Sethna’s Stat Mech Book around equations 2.7.
It’s pretty easy from this equation to see that it corresponds to the drift, diffusion step of the Langevin equation - i.e. formally the solution of
It’s also easy to work out what the stationary distribution of this equation as
which is satisfied by
Interestingly, this is very closely related to a method for simulating quantum ground states - diffusion Monte Carlo.
The Next Step: Faster Diffusing#
So far, we’ve learned that the Langevin dynamics (in the limit of small-ish
We are now going to focus explicitly on the energy functional for a harmonic oscillator
Using this energy function, we get
We know are going to let
and then let
This last identification is true in the limit of small
This leaves us
We will now let
This can be generated in python by just doing
beta = np.linspace(0.0001, 0.02, timesteps,dtype=np.float32)
Update your function
def ForwardDiffusion(x_0,beta_t,timeSteps):
to run forward diffusion over a certain number of time-steps using a time-dependent standard deviation
Grading
Make a plot for five runs a graph of location at time
This time it doesn’t make sense to collect the probability distribution over many steps because the probability distribution changes over time.
Instead, run your function 10000 times out to a time-step of T=199 and graph a histogram of the resulting distribution x(199). Plot a theory curve on top of it showing that you get the correct probability distribution.
Now, we would like to produce a new function which produces the same probability distribution but does it quickly. In other words, you want to produce the same probability distribution after some number of time steps but which doesn’t require taking each and every step to get there (i.e. you just want to be able to jump to step 150). It turns out that this possible because the sum of a bunch of random gaussian steps is a random gaussian step.
If we let
$
then you can jump to step
Grading
Go ahead and write a
def ForwardDiffusionFast(x_0,alpha_bar_t,timeSteps):
which quickly generates the same distribution as ForwardDiffusion. Plot a histogram of ForwardDiffusionFast out to T=200 and check that you get the same probability distribution as ForwardDiffusion
Undiffusing#
Our next step is to figure out how to undiffuse.
In our forward diffusion process we have samples
x0 = SamplePInit()
xt = ForwardDiffusionFast(x_0,T,alpha_bar)
giving us a (largely gaussian) final distribution
Now, we’d like to start with samples Undiffuse(xt)
which returns a sample
x0 = SamplePInit()
xt = ForwardDiffusionFast(x_0,T,alpha_bar)
new_x0 = Undiffuse(xt)
# new_x0 doesn't have to be x_0 but if we histogram $new_x0$ and $x_0$ those histograms should be the same
In some sense, this should be possible - the laws of physics don’t know about the direction in time. In practice though, you don’t often see things undiffuse even if we reverse the force (i.e. after a drop of food coloring diffuses in a cup, it doesn’t undiffuse by turning it upside-down.) The reason for this is that we aren’t actually successfully reversing the directions of “all the air molecules bouncing off the pollen” even if we do reverse the force on the pollen. Nonetheless, if we are careful (and have the right information) we can get this undiffusion to happen. Mathematically, what we need to do is figure out how to (stochastiaclly) run the Langevin Markov chain backwards.
The Langevin markov chain is a rule which tells us, given
To reverse it, we would like to figure out np.random.randn()
) then
Derivation
Let’s go ahead and derive the rule for
First start by looking at
and
Some algebra then gives us
Notice the first term on the rhs doesn’t depend on
Using our fast diffusion this gives us
You can simplify this by rewriting this as
$
where
$
and
$
This tells us that our undiffusion step should be (after guess
(technical note: we don’t do the N(0,1) step when
Now from our fast diffusion we have that
We can plug
When we work this all out the undiffusing step gives us (after we guess the random noise)
where
and
Let’s go ahead and write some functions now to get this working:
Write an Undiffuse(xt)
function which takes GuessZ(xt)
which you will also write.
The Undiffuse
function is generic. We will also write a SamplePInit()
function. For our first example, we will make it always return 0.4. In other words, our probability distribution is
Also go ahead and write GuessZ
. In this case, since you (secretly) know that
Grading
Once these functions are written, let’s work on the undiffusion.
Run your undiffusion 5 times starting all from a single diffused point. Graph
vsRun your undiffusion many times out to T=0. Show that you get a delta function at 0.4
Run your diffusion from T=0 to T=125 and your undiffusion from T=200 to T=125 and plot the histograms showing they are the same.
We now want to make life a little harder. We are going to have a new probability distribution.
Write the new SampleFromInitP
function.
Now we also need a new guessZ(xt)
function. Notice that this is much trickier and doesn’t always have an obvious answer. If I tell you what
Grading
Do the undiffusion from this probabaility distribution. Plot the same plots from the undiffusion from a single delta function.
In addition, notice that at each step in your diffusion you are making a guess for
Training#
So far we’ve managed to do undiffusion where we know what the right guess for the random number
To do this, we are going to use pytorch. To generate a simple neural network using pytorch, we can have
import torch
import torchvision
from torch import nn
n_input, n_hidden, n_out = 2, 15, 1
net = nn.Sequential(nn.Linear(n_input, n_hidden),
nn.ReLU(),
nn.Linear(n_hidden, n_hidden),
nn.ReLU(),
nn.Linear(n_hidden, n_out),
)
net(torch.tensor([3.0,4])) <--This runs the network with a noise of 3 at time-step 4
Now, we need to learn how to use pytorch to train a network to match the noise. Essentially what we are going to do is the following:
Pick a random
Get from your ForwardDiffusionFast function both the noise (<–this is a new thing you have to return) and the noisy data.
Have your network guess the noise. Using a loss-function modify your network to match the noise more closely.
Here is the general framework for pytorch optimization. You have to define some optimization pieces.
loss_fn = nn.MSELoss()
opt = torch.optim.Adam(net.parameters(), lr=1e-3)
for step in range(0,200000):
opt.zero_grad()
x0= SamplePInit()
# Choose a random time t
# call your FowardDiffusionFast (make sure you return the noisyData and the noise)
noisyData=torch.tensor([noisyData,t]).float() # include the time for the data
noise=torch.tensor([noise]) #make it so pytorch reads the noise
loss=loss_fn(noise,net(noisyData))
loss.backward()
opt.step()
Note that loss.item()
gives you the loss.
Fill out your optimization. Run it and you should then have a net which guesses your random noise.
Grading
Train your network. Plot the loss as a function of training step. You may have to do a window averaging over approximately 100 steps to generate this plot.
Now in your undiffuse, you can make your guess=net(torch.tensor([float(x_t),t]))
as opposed to calling model guess. Go ahead and make this replacement and then run your undiffusion generating both the standard 10 samples of undiffusion as well as the histogram as well as the best guess.
Grading
Generate these plots.
Prompts#
Finally in programs such as stable diffusion, you give the program a prompt and ask it to produce that image - i.e. astronaut on a horse. In our simple example here, we are going to also give our simulation a prompt: we will work with a simple version of this telling it either “left” or “right” to ask it to either find the point below zero or above zero.
To accomplish this, during the training we need to give our network not only the diffused location embedPrompt = -1 if prompt=="left" else 1
Modify your network to take an extra input:
n_input, n_hidden, n_out = 3, 15, 1 # <-- all I've changed is n_input is now 3
net = nn.Sequential(nn.Linear(n_input, n_hidden),
nn.ReLU(),
nn.Linear(n_hidden, n_hidden),
nn.ReLU(),
nn.Linear(n_hidden, n_out),
)
Now, when you call the network you need to call it as
net(torch.tensor([float(x_t),t,embedPrompt]))
During training to get the embedded prompt, you want to check if it’s less then 0 (send it the embeddedPrompt for left) or more then 0 (send it the embeddedPrompt for 1).
During undiffusion, you then have to decide whether your prompt is “left” or “right” and then run it.
Grading
Modify your code to do “prompting.” Train your network and then produce two sets of our three typical plots:
vshistogram of final outcome
guess of
as a function of
One set should be when the prompt is “left” and one set should be when the prompt is “right.”
Please continue onto 3.2 Diffusion Models - page 2