|
|
|
|
|
Feature SelectionFeature selection (or subset selection) is a process commonly used in machine learning, wherein a subset of the features available from the data are selected for application of a learning algorithm. Feature selection is necessary either because it is computationally infeasible to use all available features, or because of problems of estimation when limited data samples (but a large number of features) are present. The latter problem is related to the so-called curse of dimensionality. Simple feature selection algorithms are ad hoc, but there are also more methodical approaches. From a theoretical perspective, it can be shown that optimal feature selection for supervised learning problems requires an exhaustive search of all possible subsets of features of the chosen cardinality. For practical supervised learning algorithms, a popular approach is as follows. The first step is to assemble a set of candidate features. The second is to choose among them. One simple feature selection algorithm is to score the candidate features by some metric and then choose the features with the highest scores by that metric. Two popular metrics for classification problems are correlation and mutual information. These metrics are computed between a candidate feature (or set of features) and the desired output category.
|
 |
|
| Copyright 2005-2009 OnPedia.com. All Rights Reserved |
|
|