|
Algorithms For Calculating VarianceIn statistics, a formula for calculating the variance of a population of size n is: -
A formula for calculating the unbiased estimation of the population variance from n finite samples is: -
The method of calculation may be more easily understood from the table below where the mean is 8. | i | xi | xi − mean | (xi − mean)2 | | (index) | (datum) | (deviation) | (squared deviation) | | 1 | 5 | −3 | 9 | | 2 | 7 | −1 | 1 | | 3 | 8 | 0 | 0 | | 4 | 10 | 2 | 4 | | 5 | 10 | 2 | 4 | | n = 5 | sum = 40 | 0 | 18 | mean = 40/5 = 8 variance = (5 · 338 − 402)/(5 · 4) = 4.5 standard deviation = Note: Details of the variance calculation: 338 = + 72 + 82 + 102 + 102 40 = + 7 + 8 + 10 + 10 Algorithm Therefore a simple algorithm to calculate variance can be described by the following pseudocode: long n = 0; double sum = 0; double sum_sqr = 0; double variance; foreach x in data: n += 1; sum += x; sum_sqr += x*x; end for variance = (sum_sqr - sum*sum/n)/(n-1); Algorithm Another algorithm which avoids large numbers in sum_sqr while summing up double avg = 0; double var = 0; long n = data.length; // number of elements for i = 1 to n avg = (avg*i + datai) / (i + 1); var = (var * (i - 1) + (datai - avg)*(datai - avg)) / i; end for return var; // resulting variance
|