Be Discrete: May 2013

Friday 31 May 2013

The Ising Model

jQuery UI Slider - Multiple sliders

At exactly 100 degrees C water is in a complicated state. There are bubbles of steam roiling beneath the surface, popping and condensing or evaporating away. It looks complicated. The boiling point is a critical point of the system. Critical points are worth studying and there is a simple model that encapsulates some of the physics that occur there - the Ising Model.

Ernst Ising (actually pronounced E-zing but universally mispronounced as I-zing) solved the one-dimensional version of his model in his PhD. thesis and didn't find what he was looking for, namely a phase transition as you see in the water-steam system. He then wrongly inferred that there was no phase transition in two or higher dimensions. Fortunately the model was interesting and simple enough that people continued to investigate it and it was shown that in two dimensions there is indeed a phase transition, though not of the water-steam type. The transition is "weaker", second order instead of first. In the water-steam system the density suddenly increases as the water condenses at the critical temperature, ie. the boiling point, jumping from almost zero to 1000 kg/m${}^3$. In a second order, Ising-like, phase transition the corresponding parameter, actually the magnetization, goes from zero to a non-zero value smoothly, without a sudden jump.

The model itself is simple. We have a grid, on each grid point lives a spin. Each spin interacts with its neighbours, north, south, east and west. The spins point in only two directions, if two neighbours point in the same direction, $ \uparrow \uparrow $ or $\downarrow \downarrow$, the energy is reduced by one, if they point in opposite directions, $\uparrow \downarrow$ or $\downarrow \uparrow$, the energy is increased by one. Slightly more formally, the spins are represented by the variables $s_i$ where $i$ denotes the lattice site. The variables $s_i$ take the values $\pm 1$. The energy is given by, \[ E = - \sum_{\langle ij \rangle} s_i s_j \] where $\langle ij \rangle$ means sum over nearest neighbours. We will call a "configuration" some particular choice of $\pm 1$ for all the spins, for example below is a particular configuration of a $4 \times 4$ lattice. \[ \begin{array}{cccc} \downarrow & \downarrow & \downarrow & \uparrow\\ \downarrow & \uparrow & \downarrow & \uparrow\\ \downarrow & \downarrow & \uparrow & \uparrow\\ \uparrow & \uparrow & \downarrow & \downarrow\end{array} \]

Why does anything interesting happen? Why don't the spins just all pick a direction and line up, minimizing the total energy? Boltzmann said that the probability of a configuration with energy $E$ is proportional to $g(E)e^{-\beta E}$ where $g(E)$ is the number of configurations with energy $E$ and $\beta$ is $1/kT$, where $T$ is temperature and $k$ is a constant with the right units to make the argument of the exponential dimensionless, we will set it to one. At high temperature, small $\beta$, the exponential is approximately $1$, then the state with the largest $g(E)$ is most probable. There are only two states with lowest energy (all $+1$ or all $-1$) and it turns for this model there are many, many more configurations with $E = E_{\text{max}}$. At low temperatures, beta is large and the exponential dominates, only by making $E$ small is the probability for a configuration to occur appreciable. We have a tendency to order fighting a tendency to disorder. Energy versus entropy.

The Ising model lends itself well to computer simulations. We try to mimic this order-disorder conflict with the Metropolis algorithm. This says, pick a spin and flip it. If this reduces the energy keep the spin flipped, if it increases the energy keep the spin flipped with probability proportional to $e^{-\beta \Delta E}$ , where $\Delta E$ is the change in energy from flipping the spin. Otherwise flip it back. The probabilistic step mimics thermal fluctuations in real systems; the hotter it is the more likely we are to accept flips until whether a spin is up or down is basically random. The colder it is the less likely we are to accept flips that increase the energy and we proceed steadily down to the minimum energy. The most interesting point is the critical point, where order and disorder, energy and entropy, exactly balance. This separates the two (boring) phases, the 'frozen' low temperature state and the 'gaseous' high energy state. It is a mixture of both, containing bubbles of disorder within order within disorder at infinitum. Play with the animation below to see it for yourself. Up spins are red pixels and down spins are white. The grid is $256 \times 256$. The $\beta$ slider changes the temperature, the button unit sets all spins equal to $1$, random sets all spins to $\pm 1$ randomly and critical sets the temperature to the critical temperature.

β:

Saturday 18 May 2013

Monte Carlo Integration 3

jQuery UI Slider - Multiple sliders

The animation above does the integral $\int_0^1 dx x^\alpha$ a total of $M$ times using $N$ samples and plots the distribution of values obtained. The mean is shifted to zero and the distribution is normalized to one. You can also click the graph which will recalculate the distribution using different random numbers. For further explanation read on...

This post is about how to estimate the error in Monte Carlo Integration. It will get very mathematical but here's the answer: assume that the distribution of results is Gaussian and estimate the error from the sample variance. However, as we will see when we analyse it in detail, this is not true for arbitrary integrals, which has implications in high finance among other fields.

The first question is how to take the distribution of random x values we use for Monte Carlo and obtain the distribution of function values. The probability that we get the value $F = f(x)$ when $x$ is chosen according to $P(x)$ is \[ \int dx \delta(F - f(x)) P(x) \equiv P(F)\] where we will distinguish the probability distribution for $x$ and $F$ by their arguments. The equation above is simply saying, the probability to get $F$ is equal to the probability to get an $x$ such that $F = f(x)$ summed over all values of $x$ for which this is true. The characteristic function is defined as the expectation value of $e^{itx}$ and we define the function $W(ik)$ as the log of this (for reasons that will become clear), \[ W(ik) = ln \int dF P(F) e^{ikF} = ln \int dx P(x) e^{ikf(x)} \equiv ln \langle e^{ikf}\rangle. \] To get the third term we used the definition of $P(F)$ and the integral with respect to $P(x)dx$ defines the angle brackets.

Now comes an important assumption: assume $W(ik)$ has a series expansion \[ W(ik) = \sum_{n=1}^{\infty} \frac{C_n (ik)^n}{n!}. \] By expanding out $ln \langle e^{ikf}\rangle$ we can calculate the coefficients $C_n$. The most important ones are the first two, \[C_1 = \langle f \rangle\] \[C_2 = \langle (f - \langle f \rangle)^2 \rangle.\] We recognise (if we have taken a lot of statistics courses) $C_1$ as the mean and $C_2$ as the variance.

In a Monte Carlo integration we take more than just one sample, so what is the distribution of estimates $\bar{f} = \frac{1}{N} \sum_i^N f(x_i)$? In the same way as before the probability to get the estimate $\bar{F}$ is the probability to get $x_1 \ldots x_N$ such that $\bar{f} = \bar{F}$ (the notation is getting subtle). \[P(\bar{F}) = \int \ldots \int dx_1 \ldots dx_N P(x_1) \ldots P(x_N) \delta(\bar{F} - \frac{1}{N} \sum_i^N f(x_i)) \] This has a characteristic function $\Omega(ik)$, \[ \Omega(ik) = ln \int d\bar{F} P(\bar{F} ) e^{ik \bar{F} }. \] Using the property of delta functions $\int \delta(x - a - b) = \int \delta(x-a) + \delta(x-b)$ and the fact that all the $x_i$ are independent the integral for $P(\bar{F} )$ factorises and leads to, \[ \Omega(ik) = ln \left[ \int dx P(x) e^{ikf(x)/N } \right]^N. \] The thing is square brackets is just $W(ik/N)$ and the magic properties of $ln$ let us put the power in front, \[ \Omega(ik) = N W(ik/N)\] This means $\Omega$ can be expanded in terms of the same $C_n$ as before, \[ \Omega(ik) = \sum_{n=1}^{\infty} \frac{C_n (ik)^n}{n! N^{n-1} } \]

For $N$ very large we can truncate the series, the integrals become too difficult if $n > 2$, \[ \Omega(ik) = C_1 ik - \frac{C_2 k^2}{2 N }. \] We want $P(\bar{F} )$, the distribution of answers we get from doing Monte Carlo integration with $N$ samples, $\ln( \Omega )$ is the Fourier transform of this, therefore inverse Fourier transforming $\exp( \Omega) $ gives $P(\bar{F})$. We need, \[ P(\bar{F}) = \frac{1}{2\pi}\int dk \exp\left( C_1 (ik) + \frac{C_2 (ik)^2}{2 N } -ik\bar{F} \right). \] This is a Gaussian integral, you can look the answer up (or complete the square) to get, \[ P(\bar{F}) = \frac{N}{2 \pi C_2} \exp\left( \frac{ (F-C_1) }{2 C_2 / N} \right) \] This is a Gaussian with mean $C_1 = \langle f \rangle$ and variance $C_2/N = \langle (f - \langle f \rangle)^2 \rangle/N$.

This is great news. As long as $N$ is big enough the distribution of answers we obtain from Monte Carlo is Gaussian. Answers far from the true mean are extremely unlikely, 99% of the measurements will be within $3\sigma$ of the mean ( $\sigma = \sqrt{C_2/N}$ is the width of the Gaussian ). The largeness of $N$ is usually the assumption people worry about; however there is another one: what if the higher terms in the expansion aren't small, eg. if $C_3 \rightarrow \infty$? Then no matter how large $N$ is the truncation is not justified and the distribution won't be Gaussian.

There is a simple example where this is the case: \[\int_0^1 dx x^\alpha = \frac{1}{\alpha+1}. \] as long as $-1 < \alpha$ the integral converges. What is $P(F)$? \[P(F) = \int_\epsilon^1 \frac{ \delta(F - x^\alpha) }{1-\epsilon} = \frac{F^{(1-\alpha)/\alpha}}{\alpha(1-\epsilon)}.\] The $1/(1-\epsilon)$ is a uniform probability distribution on the interval $[\epsilon,1]$ (corresponding to $P(x)$). There is no problem with taking $\epsilon$ to zero just yet but wait. The higher terms in the expansion $C_3 \ldots$ are sums of terms like $M_n = \int dF P(F) F^n$ (usually said as cumulants $C_n$ are sums of moments $M_n$). Let's calculate the $M_n$, \[M_n = \int dF \frac{F^{(1-\alpha)/\alpha + n}}{\alpha(1-\epsilon)} = \frac{1 - \epsilon^{1+\alpha n}}{(1-\epsilon)(1 + \alpha n)} \] If $1 + \alpha n < 0$ then the $\epsilon^{1 + \alpha n}$ in the numerator blows up as $\epsilon \rightarrow 0$ and thus $M_n$ and $C_n$ diverge.

The animation at the top of the post lets you vary $\alpha$ from 1 where all the moments converge down to -0.9 where some diverge. The distributions obtained from doing $M$ Monte Carlo integrations changes from a Gaussian to a long tailed distribution. "Rare" events are no so rare anymore and $\sigma$ is a very bad estimate of the error. So be careful, not everything in the world is a bell curve.

Friday 17 May 2013

Monte Carlo Integration 2

In the previous post we talked about the simplest Monte Carlo procedure: throwing a bunch of points at your area and seeing which ones land inside. Now we are going to make things a bit more mathematical. By the way if you think the equations don't look good, I have been having trouble getting Mathjax to work with blogger, so I am using codecogs. This looks okay in Firefox but not so good in Chrome, and god help you if you are using IE. Getting the equations to look pretty is a process.

Let's try to evaluate the definite integral \[ I = \int_a^b f(x) dx \] by taking the interval $[a,b]$ and breaking it up into $N$ sub-regions. Each sub-region has width $\Delta = (b-a)/N$ and mid point $x_i = a + i \Delta / 2$. In a sub-region the function is almost constant so we can approximate it by its value at the mid-point. The integral evaluates the area under the curve $f(x)$ and we are approximating the area as a sum of rectangles, \[ I = \frac{b-a}{N} \sum_i^N f( x_i ) \] click the image below to see how this gets better as $N$ increases. The function is \[ 0.2 + 0.2 \sin(\pi x) sin(2 \pi x) + 0.3 sin(\pi x)\] which was chosen for no reason whatsoever.

This is nothing other than the mean value theorem, which says the integral is equal to the width of the interval times the average value, $\langle f \rangle$, of the function in the interval, to evaluate the integral we use \[ \langle f \rangle \approx \bar{f} = \frac{1}{N} \sum_i^N f( x_i ). \] We have used a regular grid to evaluate the average of $f(x)$, we could use a more complicated grid to give us a better estimate with fewer points, leading to Simpson's rule and so on. This is called quadrature. Instead we will try to evaluate the average by random sampling.

The Law of Large Numbers says that the sample average converges almost surely to the true average. While this seems kind of obvious eg. flipping a coin an infinite number of times will give on average 50% heads and 50% tails, the proof is long. Accepting that it is true, by sampling random points in the interval $[a,b]$ and using them to evaluate the function we can calculate the average which is equal to the true average for large enough number of samples $N$, \[ \langle f \rangle \approx \frac{1}{N} \sum_i^N f( x_i ) \] where now the points $x_i$ are randomly distributed in $[a,b]$. This is the simplest kind of Monte Carlo integration. If instead of a one dimensional integral we have a multi-dimensional one then the formula for $\bar{f}$ is the same as long as $x_i$ is now interpreted as a random point in the multidimensional integration region, which has volume $V$, \[ I = \frac{V}{N} \sum_i^N f( x_i ). \] Evaluating the same integral as before by Monte Carlo looks like:

As you can see, this converges much more slowly, so why would this be a good idea? The quadrature scheme introduced above with rectangles is called the midpoint rule, and the error we introduce is proportional to $h^{d+2}$. With a total of $N$ points we can have $N^{1/d}$ little boxes which would give an error $N^{-2/d}$. In the Monte Carlo case? If we assume that evaluating the integral many times would give us a Gaussian distribution of values around the true value (which we will prove, or not, in the next post) we estimate the error from the sample variance: \[ \sigma^2 = \frac{1}{N-1} \sum_i^N \left( f(x_i) - \bar{f} \right)^2. \] This is $N-1$ instead of $N$ in order to be an unbiased estimate, it's a (slightly) complicated story, for any reasonable value of $N$ you would use the difference is not important. The variance of the integral is then \[ \frac{1}{N-1} \sum_i^N \left( \frac{ V ( f(x_i) - \bar{f} ) }{N} \right)^2 = \frac{ V^2 \sigma^2 }{N}. \] and the error goes like the square root of this: \[ \frac{ V \sigma }{\sqrt{N} }. \] The key thing to notice is that this just depends on the number of sample points, not the dimension of the integral.

So now the thing to do is evaluate the same integral using the same number of points for Monte Carlo and the midpoint rule and see which one does better. Assume that the integration region is the unit cube in d dimensions, that is the range $[0,1]$ for each axis. We will use the integral, \[ \frac{3}{d} \int dx_1 ... dx_d \left( x_1^2 + ... + x_d^2\right) = 1\] as our multidimensional integral example. The (click it) animation below plots the difference between the answer and the estimate from the midpoint rule versus the difference between the answer and the estimate from Monte Carlo.

Input number of dimensions 1 < d < 20:

Plotted is the log of the number of samples versus the log of the error. If $dI = AN^p$ then the log-log plot has slope $p$, the steeper the curve the faster the error is reduced. When $d>4$ the error from Monte Carlo for a given $N$ tends to be smaller. Any quadrature scheme gives an error proportional to $N^{-q/d}$; for some $q$ so there will always be a $d$ for which Monte Carlo is better. In some cases the number of dimensions (in the phase space of a statistical mechanical system for example) is absolutely enormous, $10^{80}$ or larger. No quadrature will work and Monte Carlo, although a not a good choice, is the only choice.

Friday 10 May 2013

Monte Carlo Integration 1

Imagine calculating the area of a circle using a bag of marbles. One way to do this would be to enclose the circle inside a square of known area, say $1$, so that it was just touching the sides and throw the marbles in. The probability that a marble lands inside the circle is proportional to the area of the circle divided by the area of the square, this number is: $\pi (\frac{1}{2})^2/1 = \frac{\pi}{4}$. Assuming you don't spill any marbles, this means that the proportion that lie in the circle compared to the total number is given by the same ratio, $\pi/4$. The animation below illustrates this and in the process calculates $\pi$ (for this and the next animation, click the box to start or stop).

If you want to use this to calculate $\pi$ to any accuracy you will have to wait a long time. We'll talk about a better method of finding areas, or for doing any integral you can think of, in the next post. Until then you can enjoy this even less efficient calculation of $\pi$: a rose is a curve with the parametric equation, $r = acos(k \theta)$. If $k$ is an odd integer the rose has $k$ petals and for $k$ even there are $2 k$. The area of these roses is $\pi/2$ or $\pi/4$ for $k$ even or odd respectively. To see this let's calculate the area for $k$ even. Put $k = 2n$, the area $A$ is, \[A = \frac{1}{2} \int_0^{2 \pi} r^2 d \theta = \frac{1}{2} \int_0^{2 \pi} a^2 cos(2n \theta) d \theta = \] \[ \int_0^{2 \pi} \frac{a^2}{4}\left(1 + \cos(2n \theta)\right) d \theta = \frac{a^2}{4}\left( \theta - \frac{sin(2n\theta)}{2n}\right)\bigg|_0^{2 \pi} = \frac{a^2 \pi}{2}. \] Using the same method we had above for the circle, we can calculate the ratio of hits landing inside the petals to the total number, which will give us the integral and hence $\pi$.

(Bonus): Rose curves look very cool when k is a rational number $k = n/d$, try it! To stop browsers from crashing $n_{max} = k_{max} = 30$