This article includes a list of references, but chi square statistics pdf sources remain unclear because it has insufficient inline citations. Please help to improve this article by introducing more precise citations. Without other qualification, ‘chi-squared test’ often is used as short for Pearson’s chi-squared test.
The chi-squared test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. In the standard applications of the test, the observations are classified into mutually exclusive classes, and there is some theory, or say null hypothesis, which gives the probability that any observation falls into the corresponding class.
The purpose of the test is to evaluate how likely it is between the observations and the null hypothesis. Chi-squared tests are often constructed from a sum of squared errors, or through the sample variance.
Test statistics that follow a chi-squared distribution arise from an assumption of independent normally distributed data, which is valid in many cases due to the central limit theorem. A chi-squared test can be used to attempt rejection of the null hypothesis that the data are independent. In the 19th century, statistical analytic methods were mainly applied in biological data analysis and it was customary for researchers to assume that observations followed a normal distribution, such as Sir George Airy and Professor Merriman, whose works were criticized by Karl Pearson in his 1900 paper. Until the end of 19th century, Pearson noticed the existence of significant skewness within some biological observations.
In order to model the observations regardless of being normal or skewed, Pearson, in a series of articles published from 1893 to 1916, devised the Pearson distribution, a family of continuous probability distributions, which includes the normal distribution and many skewed distributions, and proposed a method of statistical analysis consisting of using the Pearson distribution to model the observation and performing the test of goodness of fit to determine how well the model and the observation really fit. In this paper, Pearson investigated the test of goodness of fit. This conclusion caused some controversy in practical applications and was not settled for 20 years until Fisher’s 1922 and 1924 paper. One test statistic that follows a chi-squared distribution exactly is the test that the variance of a normally distributed population has a given value based on a sample variance.
Such tests are uncommon in practice because the true variance of the population is usually unknown. For an exact test used in place of the chi-squared test, see Fisher’s exact test. Using the chi-squared distribution to interpret Pearson’s chi-squared statistic requires one to assume that the discrete probability of observed binomial frequencies in the table can be approximated by the continuous chi-squared distribution. This assumption is not quite correct and introduces some error.