next up previous
Next: References Up: No Title Previous: No Title

APPENDIX: ${\chi }^2$ and C-statistic for Sparse Sampling

An observed time series of photon counts, fobs(t), which is binned into time intervals of length $\Delta t$, so that the number $n_i=f^{obs}[t_i < t <t_i+\Delta t]$ represents the photon counts in bin $[t_i,t_i+\Delta t]$, is modeled by an analytical function, fmodel(t), which predicts an expectation value $e_i=f^{model}[t_i < t <t_i+\Delta t]$ of photon counts in time bin i. In the limit of large-number counts, $n_i \gg 0$, Poisson statistics can be approximated by a Gaussian distribution, which predicts an r.m.s. noise of ${\sigma}_i = \sqrt{e_i} \approx \sqrt{n_i}$. The goodness of fit of a model can then be evaluated by the well-known ${\chi }^2$-statistic,


\begin{displaymath}{\chi}^2 = {1 \over N} \sum_{i=1}^{N} {(n_i-e_i)^2 \over {\si...
...2}
= {1 \over N} \sum_{i=1}^{N} {(n_i-e_i)^2 \over e_i} \ ,
\end{displaymath} (1)

yielding a normalized value of ${\chi}^2 \approx 1$ in the case when the model is consistent with the data, within the uncertainties of the expected noise.

For sparse sampling, in particular when the observed time series nicontains zeroes or few counts, say $n_i \lower.4ex\hbox{$\;\buildrel <\over{\scriptstyle\sim}\;$ }10$, e.g. as it is the case for the modulation profiles of the finest HESSI collimators, where the total photon counts are binned into some 212=4096 time bins to resolve the rotational modulation, the Gaussian ${\chi }^2$-statistic is not appropriate anymore. Photon counting statistics for sparse sampling can be derived from the general probability function P as outlined by Cash(1979), which is


\begin{displaymath}P = \prod_{i=1}^N {e_i^{n_i} e^{-e_i} \over n_i! }
\end{displaymath} (2)

for a particular result ni given the correct set of ei. The likelihood radio, called C-statistic by Cash (1979), can be expressed in logarithmic form from Eq.A2,


\begin{displaymath}C_{Cash} = - 2 \ln P = - 2 \sum_{i=1}^N (n_i \ln e_i - e_i - \ln n_i!) \ .
\end{displaymath} (3)

Let the theoretical model $e_i({\Theta}_1, ... , {\Theta}_p)$ be defined by p free parameters ${\Theta}_i, i=1, ... , p$. The best-fitting model is found by varying all p parameters ${\Theta}_1,
... , {\Theta}_p$ until the C-statistic reaches a minimum, which we denote by (Cmin)p. Now, during the iterative fitting procedure, only a partial subset of model parameters, say q parameters (with q < p), may have already converged to the true solution, ${\Theta}_1^T,
... , {\Theta}_p^T$, so that ${\Theta}_1, ... , {\Theta}_q$ are set to ${\Theta}_1^T, ... , {\Theta}_q^T$, while the remaining p-qparameters, ${\Theta}_{q+1}, ... , {\Theta}_p$, still need to be varied until a global minimum of C is reached. We denote this partial solution, where p-q parameters have already converged to the true solution ${\Theta}_i^T$, with the value (Cmin)Tp-q. According to the theorem of Wilks (1938; 1963), the difference,


\begin{displaymath}\Delta C = (C_{min})^T_{p-q} - (C_{min})_p
\end{displaymath} (4)

will be distributed as ${\chi }^2$ with q degrees of freedom. Therefore, the quantity $\Delta C$ can be used to establish a confidence criterion of the model ei to the data ni. Following Cash (1979), the term $\ln(n_i!)$ of Eq.A3 drops out in the difference $\Delta C$, because only the parameters $e_i({\Theta}_1, ... , {\Theta}_p)$ are varied during the fitting procedure. Thus, it is more convenient to use the simplified statistic


\begin{displaymath}C_{simplified} = - 2 \ln P = - 2 \sum_{i=1}^N (n_i \ln e_i -
e_i ) \ . \end{displaymath} (5)

For the evaluation of the difference $\Delta C$ in Eq.A4 we have the partially optimized term (Cmin)Tp-q,


\begin{displaymath}(C_{min})^T_{p-q} = - 2 \sum_{i=1}^N (n_i \ln {(e_i)}_{p-q}^T
- {(e_i)}_{p-q}^T ) \ ,
\end{displaymath} (6)

and the absolute minimum (Cmin)p, which represents the asymptotic limit when the data ni perfectly match the model, i.e. ei=ni,

\begin{displaymath}(C_{min})_p = - 2 \sum_{i=1}^N [n_i \ln (n_i) - n_i ] \ .
\end{displaymath} (7)

The combination of Eq.A4-A6 yields then


\begin{displaymath}\Delta C = - 2 \sum_{i=1}^N [n_i \ln (e_i) - e_i - n_i \ln
(n_i) + n_i] \ .
\end{displaymath} (8)

Instead of the reduced ${\chi }^2$-statistic (A1), we can therefore use the equally-simple

C-statistic (where we drop the symbol $\Delta$ for brevity),


\begin{displaymath}C := {1 \over N}\Delta C = {2 \over N} \sum_{i=1}^N [n_i \ln
({n_i \over e_i}) - (n_i - e_i)] \ .
\end{displaymath} (9)

This form has the advantage that, in addition to being asymptotic to ${\chi }^2$, C vanishes identically when the model fits the data exactly. Numerical care has to be taken for the time bins that contain zeroes in the data, ni=0, in which case the mathematical relation $n_i \ln
(n_i)=0$ has to be used to avoid the singularity $\ln(n_i) \mapsto
\infty$. On the other side, a singularity could arise when the model predicts zero counts, ei=0, but the observed counts are not zero, $n_i \ne 0$, because the term $-n_i \ln (e_i)$ yields than infinity. It seems therefore to be recommendable to restrict the model fit to time intervals with a finite probability for photon counts, i.e. ei>0.


next up previous
Next: References Up: No Title Previous: No Title
Ed Schmahl
1999-12-17