Boltzmann Machine

A Boltzmann machine is a type of stochastic recurrent neural network originally invented by Geoffrey Hinton and Terry Sejnowski. Boltzmann machines can be seen as the stochastic, generative counterpart of Hopfield nets. They were an early example of neural networks capable of forming internal representations. Because they are very slow to simulate they are not very useful for most practical purposes. However, they are theoretically intriguing due to the biological plausibility of their training algorithm.

Structure

A Boltzmann machine, like a Hopfield net is a network of binary units with an "energy" defined for the network. Unlike Hopfield nets though, Boltzmann machines only ever have units that take values of 1 or 0. The global energy, E, in a Boltzmann machine is identical to that of a Hopfield network, that is:
E = -\sum_{i
Where:
  • w_{ij} is the connection weight from unit j to unit i.
  • s_i is the state (1 or 0) of unit i.
  • \theta_i is the threshold of unit i.
Thus, the difference in the global energy that results from a single unit i being 0 or 1, written {\Delta}{E_i}, is given by:
\Delta\ E_i = \sum_i{w_{ij}s_i}-\theta_i
A Boltzmann machine is made up of stochastic units. The probability, p_i of the ith unit being on is given by:
p_i=\frac{1}{(1+e^{-(\Delta\ E_i/T)})}
(The scalar T is referred to as the "temperature" of the system.) Units are divided into "visible" units, V, and "hidden" units, H. The visible units are those which receive information from the "environment", i.e. those units that receive binary state vectors for training. The connections in a Boltzmann machine have three restrictions on them:
  • w_{ii}=0, \forall i. (No unit has a connection with itself.)
  • w_{ij}=w_{ji}, \forall i,j. (All connections are symmetric.)
  • w_{ij}=0, \forall i,j: i \in V, j \in V. (Visible units have no connections between them.)

Training

Boltzmann machines can be viewed as a type of maximum likelihood model, i.e. training involves modifying the parameters (weights) in the network to maximize the probability of the network producing the data as it was seen in the training set. In other words, the network must successfully model the probabilities of the data in the environment. There are two phases to Boltzmann machine training. One is the "positive" phase where the visible units' states are clamped to a particular binary state vector from the training set. The other is the "negative" phase where the network is allowed to run freely, i.e. no units have their state determined by external data. A vector over the visible units is denoted V_{\alpha} and a vector over the hidden units is denoted as H_{\beta}. The probabilties P^{+}(S) and P^{-}(S) represent the probability for a given state, S, in the positive and negative phases respectively. Note that this means that P^{+}(V_{\alpha}) is determined by the environment for every V_{\alpha}, because the visible units are set by the environment in the positive phase. Boltzmann machines are trained using a gradient descent algorithm, so a given weight, w_{ij} is changed by subtracting the partial derivative of a cost function with respect to the weight. The cost function used for Boltzmann machines, G, is given as:
G = \sum_{\alpha}{P^{+}(V_{\alpha})\ln{\frac{P^{+}(V_{\alpha})}{P^{-}(V_{\alpha})}}}
This means that the cost function is lowest when the probability of a vector in the negative phase is equivalent to the probability of the same vector in the positive phase. As well, it ensures that the most probable vectors in the data have the greatest effect on the cost. This cost function would seem to be complicated to perform gradient descent with. Suprisingly though, the gradient with respect to a given weight, w_{ij}, at thermal equilibrium is given by the very simple equation:
\frac{\partial{G}}{\partial{w_{ij}}} = -\frac{1}{T}p_{ij}^{+}-p_{ij}^{-}
Where:
  • p_{ij}^{+} is the probability of units i and j both being on in the positive phase.
  • p_{ij}^{-} is the probability of units i and j both being on in the negative phase.
This result follows from the fact that at thermal equilibrium the probability of any state when the network is free-running is given by the Boltzmann distribution (hence the name "Boltzmann machine"). A state of thermal equilibrium is crucial for this though, hence the network must be brought to thermal equilibrium before the probabilities of two units both being on are calculated. Thermal equilibrium is achieved with simulated annealing in a Boltzmann machine. It is the necessity of simulated annealing which can make training a Boltzmann machine on a digital computer a very slow process. However, this learning rule is fairly biologically plausible because the only information needed to change the weights is provided by "local" information. That is, the connection (or synapse biologically speaking) does not need information about anything other than the two neurons it connects. This is far more biologically realistic than the information needed by a connection in many other neural network training algorithms, such as backpropagation.

See also

References

* Hinton, G. E., Sejnowski, T. J. (1986) Learning and Relearning in Boltzmann Machines. In D. E. Rumelhart, J. L. McClelland, and the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations. (pp 282-317) Cambridge: MIT Press

 

<< PreviousWord BrowserNext >>
linda salzman sagan
ray davis (soldier)
rvrworkers
mater dei high school
jim fitzpatrick (politician)
raymond davis
ray davis
magic mountain
university of british columbia alma mater society
kimahri ronso
list of constituents of the london and north eastern railway
rikku
creel commission
judy marsales
multisync (software)
aira force
tayleur
kingston flyer
knysna
uss sablefish (ss 303)
owen sheers
crime prevention through environmental design
timewatch
formosan clouded leopard
cordon sanitaire
national league cup
agrapha
o(1) scheduler
jeremiah vanscoyoc
little tramp
simms
leaders debate
dave levac
uss camden (as 6)
john salazar
hetchins
habitual abortion
all the king's men (2005 movie)
bishop's university students' representative council
uss camden
threatened abortion
microsoft codenames
inevitable abortion
witold rybczynski