(1) |
(1) |
The a priori probability of getting a given image b is therefore
this number divided by the number of all possible maps:
(2) |
If there are many photons in each pixel () we can use Stirling's approximation for the factorial, . Using this, .
It is much more convenient to normalize all the pixel intensities to the so-called ``grey map'', where all the intensities equal the mean value .
(3) |
where is a normalizing constant that guarantees that sums to 1 over all possible maps . is the probability of the ``grey'' map, since all the factors in equation (3) are 1 in that case. It can be shown that is the maximum value of .
None of this looks at all like the ``entropy'' term we all know and love in MEM, but equation (3) can be re-written in an exponential form which gives us the so-called ``entropy'' term .
(4) |
In our derivation, , but Skilling and Gull (1989) and Skilling (1989) argued that it is an indeterminate free parameter of the probability distribution, and this free parameter is used in virtually all MEM programs.
Now that we have the ``prior'' probability, we need to find the "likelihood". The "likelihood" (``P of D given b'') is the probability that the observed data D could have been generated from a given image b. )
This probability is computed using the difference of the simulated data computed from some guess for the image b and the real data D. For Gaussian statistics,
(5) |
In somewhat more generality,
(6) |
But what we want for the deconvolution is not but , that is, the probability of the image b given the data D. This is given by Bayse's theorem:
(7) |
At last we can write down the quantity to be maximized. Inserting the expressions (2) and (4) for and into equation (5), we get:
(8) |
(9) |
The first term is the constraint of the data; it is 0 when the image matches the data perfectly, but in realistic cases is when the data are fitted but not ``over fitted''. The second term is commonly called the "entropy" term, which guarantees the "simplicity" of the image. (It is not really the entropy in the sense used by Shannon (1948), but the weight of common use forces us to call it that.) The ``entropy'' term is maximum (0) when . That is, the ``grey'' map is a priori most probable.
Note the arbitrary coefficient giving relative weight to the ``entropy'' term. If , Bayesian logic would give equal weight to each term, but MEM algorithms usually weight them differently to provide simpler maps () or maps more faithful to the data (). It is possible, using Bayseian arguments, to estimate the most probable (Skilling 1989)
Since the dimensionality of the parameter space for MEM is the number of map pixels, we cannot show contours of the function ) in a realistic case. Instead we show a case for 3 pixels, {}, where the sum of the pixel brightnesses equals 100. This reduces it to a 2-dimensional problem. In the figure below, the dashed contours represent the entropy function , and the solid contours show the function. The squares are located at the maximum of for different values of ". When , the maximum is located at the entropy peak, and when , it is located at the minimum of .
This 3-pixel example is more general than one might at first think. In fact, for an arbitrary number of pixels (say N), any two-dimensional cross section of the N-dimensional entropy function through parameter space will look exactly like this. The cross-sections of the function will always be ellipses with, of course, different centroids and axial ratios.
The goal of MEM is to find the optimum value along this curve, somewhere in parameter space between the "prior" grey map and the map which fits the data exactly. Skilling has argued that the ``most probable'' is the one that makes the number of ``good'' measurements equal to -, which is the (dimensionless) amount of structure in the map . The number of ``good'' measurements is defined by a sum over curvature terms in the likelihood matrix (Skilling 1989).