Other Definitions
regression analysis (dict)

Regression Analysis

Regression analysis is any statistical method where the mean of one or more random variables is predicted conditioned on other (measured) random variables. In particular, there are linear regression, logistic regression and supervised learning. Regression analysis is the statistical view of curve fitting: choosing a curve that best fits given data points. Usually there are only two variables, one of which is called X and can be regarded as constant, i.e., non-random, because it can be measured without substantial error and its values can even be chosen at will. For this reason it is called the independent or controlled variable. The other variable called Y, is a random variable called the dependent variable, because its values depend on X. In regression we are interested in the variation of Y on X. Typical examples are the dependence of the blood pressure Y on the age X of a person or, as we shall now say, the regression of Y on X, the regression of the of weight Y of certain animals on their daily ration of food X. See also: multivariate normal distribution, important publications in regression analysis. Regression is usually posed as an optimization problem as we are attempting to find a solution where the error is at a minimum. The most common error measure that is used is the least squares: this corresponds to a Gaussian likelihood of generating observed data given the (hidden) random variable. In a certain sense, least squares is an optimal estimator: see the Gauss-Markov theorem. The optimization problem in regression is typically solved by algorithms such as the gradient descent algorithm, the Gauss-Newton algorithm, and the Levenberg-Marquardt algorithm. Probabilistic algorithms such as RANSAC can be used to find a good fit for a sample set, given a parametrized model of the curve function. Regression can be expressed as a maximum likelihood method of estimating the parameters of a model. However, for small amounts of data, this estimate can have high variance. Some practitioners use maximum a posteriori (MAP) methods, which place a prior over the parameters and then choose the parameters that maximize the posterior. MAP methods are related to Occam's Razor: there is a preference for simplicity among a family of regression models (curves) just as there is a preference for simplicity among competing theories.

Example

The simplest example of regression is in the one dimensional case. We are given a vector of x values and another vector of y values and we are attempting to find a function such that f(x_{i}) = y_{i} .
let
\vec{x} = \begin{pmatrix} -2 \\ -1 \\ 0 \\ 1 \\ 2 \\ \end{pmatrix}, \vec{y} = \begin{pmatrix} 5 \\ 2 \\ 1 \\ 2 \\ 5 \\ \end{pmatrix} Lets assume that our solution is in the family of functions defined by a 3rd degree Fourier expansion written in the form: f(x) = a_{0}/2 + a_{1}\cos(x) + b_{1}\sin(x) + a_{2}\cos(2x) + b_{2}\sin(2x) + a_{3}\cos(3x) + b_{3}\sin(3x) where a_{i}, b_{i} are real numbers. This problem can be represented in matrix notation as: \begin{pmatrix} 1/2 & \cos(x) & \sin(x) & \cos(2x) & \sin(2x) & \cos(3x) & \sin(3x) \\ \end{pmatrix} . \begin{pmatrix} a_{0} \\ a_{1} \\ b_{1} \\ a_{2} \\ b_{2} \\ a_{3} \\ b_{3} \\ \end{pmatrix} = \vec{y} filling this form in with our given values yields a problem in the form Xw = y
\begin{pmatrix} 1/2 & \cos(-2) & \sin(-2) & \cos(-4) & \sin(-4) & \cos(-6) & \sin(-6)\\ 1/2 & \cos(-1) & \sin(-1) & \cos(-2) & \sin(-2) & \cos(-3) & \sin(-3)\\ 1/2 & 1 & 0 & 1 & 0 & 1 & 0\\ 1/2 & \cos(1) & \sin(1) & \cos(2) & \sin(2) & \cos(3) & \sin(3)\\ 1/2 & \cos(2) & \sin(2) & \cos(4) & \sin(4) & \cos(6) & \sin(6)\\ \end{pmatrix} . \begin{pmatrix} a_{0} \\ a_{1} \\ b_{1} \\ a_{2} \\ b_{2} \\ a_{3} \\ b_{3} \\ \end{pmatrix} = \begin{pmatrix} 5 \\ 2 \\ 1 \\ 2 \\ 5 \\ \end{pmatrix} This problem can now be posed as an optimization problem to find the minimum sum of squared errors.
\min_{\vec{w}} \sum_{i=1}^{n} (\vec{x_{i}}\vec{w} - y_{i})^2
\min_{\vec{w}} \|X\vec{w} - \vec{y}\|^2.
solving this with least squares yields:
\vec{w} = \begin{pmatrix} 0 \\ 4.25 \\ 0 \\ -6.13 \\ 0 \\ 2.88 \\ 0 \\ \end{pmatrix} thus the 3rd degree Fourier function that fits the data best is given by:
f(x) = 4.25\cos(x) - 6.13\cos(2x) + 2.88\cos(3x)

References

Audi, R., Ed. (1996) The Cambridge Dictionary of Philosophy. Cambridge, Cambridge University Press. curve fitting problem p.172-173.

External links

 

<< PreviousWord BrowserNext >>
primary rainforest
braess' paradox
maitum, sarangani
patsy keever
melastomataceae
coronation issue
all india moovendar munnani kazhagam
secondary rainforest
wilfred rowland childe
mobile suit gundam seed ds
lionel corporation
lorentz lorenz
scholarship hall
dataflow analysis
bunkbed
clausius mossotti
all india tribes and minorities front
same sex marriage in yukon
list of zip codes in kansas
postorder
clogp
united states bicentennial coinage
beinn eighe
all jammu & kashmir patriotic peoples front
pikit, cotabato
metropolitan community colleges of kansas city
object role modeling
ti 55 iii
lazarus taxon
ambedkar national congress
bleed air
kansas state highway 66
elvis taxon
god ginrai
1981 irish hunger strike
junk (transformers)
t s krishnamurthy
dresden university of technology
sloc
lauri lehtinen
ellen mary clerke
rpf
oxleas wood
pax cybertronia