Next: References Up: No Title Previous: No Title

APPENDIX: ${\chi }^2$ and C-statistic for Sparse Sampling

An observed time series of photon counts, f^obs(t), which is binned into time intervals of length $\Delta t$ , so that the number $n_i=f^{obs}[t_i < t <t_i+\Delta t]$ represents the photon counts in bin $[t_i,t_i+\Delta t]$ , is modeled by an analytical function, f^model(t), which predicts an expectation value $e_i=f^{model}[t_i < t <t_i+\Delta t]$ of photon counts in time bin i. In the limit of large-number counts, $n_i \gg 0$ , Poisson statistics can be approximated by a Gaussian distribution, which predicts an r.m.s. noise of ${\sigma}_i = \sqrt{e_i} \approx \sqrt{n_i}$ . The goodness of fit of a model can then be evaluated by the well-known ${\chi }^2$ -statistic,

$\begin{displaymath}{\chi}^2 = {1 \over N} \sum_{i=1}^{N} {(n_i-e_i)^2 \over {\si... ...2} = {1 \over N} \sum_{i=1}^{N} {(n_i-e_i)^2 \over e_i} \ , \end{displaymath}$

(1)

yielding a normalized value of ${\chi}^2 \approx 1$ in the case when the model is consistent with the data, within the uncertainties of the expected noise.

For sparse sampling, in particular when the observed time series n_icontains zeroes or few counts, say $n_i \lower.4ex\hbox{$\;\buildrel <\over{\scriptstyle\sim}\;$ }10$ , e.g. as it is the case for the modulation profiles of the finest HESSI collimators, where the total photon counts are binned into some 2¹²=4096 time bins to resolve the rotational modulation, the Gaussian ${\chi }^2$ -statistic is not appropriate anymore. Photon counting statistics for sparse sampling can be derived from the general probability function P as outlined by Cash(1979), which is

$\begin{displaymath}P = \prod_{i=1}^N {e_i^{n_i} e^{-e_i} \over n_i! } \end{displaymath}$

(2)

for a particular result n_i given the correct set of e_i. The likelihood radio, called C-statistic by Cash (1979), can be expressed in logarithmic form from Eq.A2,

$\begin{displaymath}C_{Cash} = - 2 \ln P = - 2 \sum_{i=1}^N (n_i \ln e_i - e_i - \ln n_i!) \ . \end{displaymath}$

(3)

Let the theoretical model $e_i({\Theta}_1, ... , {\Theta}_p)$ be defined by p free parameters ${\Theta}_i, i=1, ... , p$ . The best-fitting model is found by varying all p parameters ${\Theta}_1, ... , {\Theta}_p$ until the C-statistic reaches a minimum, which we denote by (C_min)_p. Now, during the iterative fitting procedure, only a partial subset of model parameters, say q parameters (with q < p), may have already converged to the true solution, ${\Theta}_1^T, ... , {\Theta}_p^T$ , so that ${\Theta}_1, ... , {\Theta}_q$ are set to ${\Theta}_1^T, ... , {\Theta}_q^T$ , while the remaining p-qparameters, ${\Theta}_{q+1}, ... , {\Theta}_p$ , still need to be varied until a global minimum of C is reached. We denote this partial solution, where p-q parameters have already converged to the true solution ${\Theta}_i^T$ , with the value (C_min)^T_p-q. According to the theorem of Wilks (1938; 1963), the difference,

$\begin{displaymath}\Delta C = (C_{min})^T_{p-q} - (C_{min})_p \end{displaymath}$

(4)

will be distributed as ${\chi }^2$ with q degrees of freedom. Therefore, the quantity $\Delta C$ can be used to establish a confidence criterion of the model e_i to the data n_i. Following Cash (1979), the term $\ln(n_i!)$ of Eq.A3 drops out in the difference $\Delta C$ , because only the parameters $e_i({\Theta}_1, ... , {\Theta}_p)$ are varied during the fitting procedure. Thus, it is more convenient to use the simplified statistic

$\begin{displaymath}C_{simplified} = - 2 \ln P = - 2 \sum_{i=1}^N (n_i \ln e_i - e_i ) \ . \end{displaymath}$

(5)

For the evaluation of the difference $\Delta C$ in Eq.A4 we have the partially optimized term (C_min)^T_p-q,

$\begin{displaymath}(C_{min})^T_{p-q} = - 2 \sum_{i=1}^N (n_i \ln {(e_i)}_{p-q}^T - {(e_i)}_{p-q}^T ) \ , \end{displaymath}$

(6)

and the absolute minimum (C_min)_p, which represents the asymptotic limit when the data n_i perfectly match the model, i.e. e_i=n_i,

$\begin{displaymath}(C_{min})_p = - 2 \sum_{i=1}^N [n_i \ln (n_i) - n_i ] \ . \end{displaymath}$

(7)

The combination of Eq.A4-A6 yields then

$\begin{displaymath}\Delta C = - 2 \sum_{i=1}^N [n_i \ln (e_i) - e_i - n_i \ln (n_i) + n_i] \ . \end{displaymath}$

(8)

Instead of the reduced ${\chi }^2$ -statistic (A1), we can therefore use the equally-simple

C-statistic (where we drop the symbol $\Delta$ for brevity),

$\begin{displaymath}C := {1 \over N}\Delta C = {2 \over N} \sum_{i=1}^N [n_i \ln ({n_i \over e_i}) - (n_i - e_i)] \ . \end{displaymath}$

(9)

This form has the advantage that, in addition to being asymptotic to ${\chi }^2$ , C vanishes identically when the model fits the data exactly. Numerical care has to be taken for the time bins that contain zeroes in the data, n_i=0, in which case the mathematical relation $n_i \ln (n_i)=0$ has to be used to avoid the singularity $\ln(n_i) \mapsto \infty$ . On the other side, a singularity could arise when the model predicts zero counts, e_i=0, but the observed counts are not zero, $n_i \ne 0$ , because the term $-n_i \ln (e_i)$ yields than infinity. It seems therefore to be recommendable to restrict the model fit to time intervals with a finite probability for photon counts, i.e. e_i>0.

Next: References Up: No Title Previous: No Title

Ed Schmahl
1999-12-17

APPENDIX: and C-statistic for Sparse Sampling

APPENDIX: ${\chi }^2$ and C-statistic for Sparse Sampling