Vc Dimension

The VC dimension (for Vapnik Chervonenkis dimension) is a measure of the capacity of a statistical classification algorithm. It is one of the core concepts in Vapnik Chervonenkis theory. It was originally defined by Vladimir Vapnik and Alexey Chervonenkis. Intuitively, the capacity of a classification model is related to how complicated it can be. Think of thresholding a high-degree polynomial, where if the polynomial evaluates above zero, we classify that point into a positive class, negative otherwise. If we use a very high-degree polynomial, it can be very wiggly, and can fit a training set exactly. But, we should expect that outside of the training points, the classifier would not generalize well, because it is too wiggly. Such a polynomial has a high capacity. Alternatively, we can think about thresholding a linear polynomial. This polynomial may not fit the entire training set, because it has a low capacity. This notion of capacity can be made more rigorous. First, we need the concept of shattering. Consider a classification model f with some parameter vector \theta. The model f can shatter a set of data points (x_1,x_2,\ldots,x_n) if, for all assignments of labels to those data points, there exists a \theta such that the model f makes no errors when evaluating that set of data points. Now, we are ready to define a mathematical notion of capacity, called the VC dimension. The VC dimension of a model f is the maximum h such that some data point set of cardinality h can be shattered by f. The VC dimension has utility in statistical learning theory, because it can predict a probabilistic upper bound on the test error of a classification model. The bound on the test error of a classification model (on data that is drawn i.i.d. from the same distribution as the training set) is given by
Training error + \sqrt{h(\log(2N/h)+1)-\log(\eta/4)\over N}
with probability 1-\eta, where h is the VC dimension of the classification model, and N is the size of the training set.

References and sources

  • Andrew Moore's VC dimension tutorial
  • V. Vapnik and A. Chervonenkis. "On the uniform convergence of relative frequencies of events to their probabilities." Theory of Probability and its Applications, 16(2):264--280, 1971.
  • A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. "Learnability and the Vapnik-Chervonenkis dimension." Journal of the ACM, 36(4):929--865, 1989.

 

<< PreviousWord BrowserNext >>
squamata
disk formatting
nomina anatomica veterinaria
tsuen wan line (mtr)
x japan
dir en grey
aldabra island day gecko
abigail's party
aris christofellis
lahti
charqui
list of slovak poets
kuopio
hsbc tower
m.a.s.k.
top 40 greatest tv shows
cheke's day gecko
assumption island day gecko
diversitas
kreuzberg
petrozavodsk
olivenza
dog fighting
boroughs of berlin
juris hartmanis
vapnik chervonenkis theory
megali idea
alexey chervonenkis
afghanistan timeline 1901 1910
aemilia scaura
new holland
legio xxii deiotariana
equine colic
chesley bonestell
andaman islands day gecko
standard language
metro ag
monthly review
quanta cura
vicente lusitano
george mikan
neudeck
syllabus of errors
zoolander