Protein Data Bank

The Protein Data Bank (PDB) is a repository for 3-D structural data of proteins and nucleic acids. This data, typically obtained by X-ray crystallography or NMR spectroscopy, is submitted by biologists and biochemists from around the world, is released into the public domain, and can be accessed for free. The database is the central repository for biological structural data.

History

Founded in 1971 by Brookhaven National Laboratory, the Protein Data Bank was transferred in 1998 to the Research Collaboratory for Structural Bioinformatics (RCSB), which is composed of Rutgers University, the University of Wisconsin, Madison, NIST and the San Diego Supercomputer Center. Funding comes from the National Science Foundation, Department of Energy, National Library of Medicine and the National Institute of General Medical Sciences. The European Bioinformatics Institute in the UK and the Institute for Protein Research in Japan also collect, process and submit data files. The PDB is a key resource in structural biology and is critical to more recent work in structural genomics. Countless derived databases and projects have been developed to integrate and classify the PDB interms of protein structure, protein function and protein evolution.

Contents

As of 1 October, 2004, the database contained 27,428 released atomic coordinate entries (or "structures") and took in about 2,000-3,000 new ones per year. Data are stored in the mmCIF format specifically developed for the purpose. Note that the database stores information about the exact location of all atoms in a large biomolecule; if one is only interested in sequence data, i.e. the list of amino acids making up a particular protein or the list of nucleotides making up a particular nucleic acid, the much larger databases from Swiss-Prot and the International Nucleotide Sequence Database Collaboration should be used.

Statistics

As of 22 February, 2005, the "PDB Holdings List" at RCSB reported the following statistics:
!Proteins, Peptides, and Viruses !Protein/Nucleic Acid Complexes !Nucleic Acids !Carbohydrates !Total
a href="/encyclopedia/X-ray-Diffraction" title="X-ray Diffraction">X-ray Diffraction and other align="right"| 23431 align="right"| 1134 align="right"| 774 align="right"| 11 align="right"| 25350
a href="/encyclopedia/NMR" title="NMR">NMR align="right"| 3633 align="right"| 105 align="right"| 643 align="right"| 2 align="right"| 4383
otal align="right"| 27064 align="right"| 1239 align="right"| 1417 align="right"| 13 align="right"| 29733
bgcolor="#EFEFEF"
Through the years the PDB has undergone many, many changes and revisions. Its original format was dictated by the width of computer punch cards.
  • RCSB PDB Guide The PDB format specification can be found here, and it is vital that you read this before looking at the raw data.
  • ftp.rcsb.org The raw data can be downloaded from here.
  • www.rcsb.org Statistics about the PDB can be found here.
This legacy format has caused many problems with the format, and consequently the PDB has three distinct 'clean-up' projects; Each of these grant-funded projects has attempted to achieve the same goal via different routes. The Data Uniformity Project is hosted by the RCSB (the current home of the PDB). Each uses the original PDB data to derive a new format; The MMDB uses ASN.1 (and an XML conversion of this format); The MSD uses a Relational Database; The Data Uniformity Project uses mmCIF (and another XML conversion of this format). Some people would say that this is a Good Thing; others would argue that, without a universal repository of information (i.e., a common dictionary), how can we talk about the same thing. Each structure published in PDB receives a four-character alphanumeric identifier, its PDB ID. This should not be used as an identifier for biomolecules, since often several structures for the same molecule (in different environments or conformations) are contained in PDB with different PDB IDs. If a biologist submits structure data for a protein or nucleic acid, PDB staff reviews and annotates it. The data are then automatically checked for plausibility. The source code for this validation software has been released for free. The main data base accepts only experimentally derived structures, and not theoretically predicted ones (see protein structure prediction). Various funding agencies and scientific journals now require scientists to submit their structure data to PDB.

Viewing the data

The structural data can be used to visualize the biomolecules with appropriate software, such as rasmol, chime or a web browser VRML plugin. The PDB website also contains resources for education, structural genomics, and related software.

Links to enzyme database data

References

  • Bernstein FC, Koetzle TF, Williams GJ, Meyer Jr EF, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol 1977;112:535-542. PMID 875032.

External links

 

<< PreviousWord BrowserNext >>
audion
good thing
bad thing
bad and wrong
evil and rude
single party state
all elbows
john hughes
aude river
historical whodunnit
log normal distribution
gwenwynwyn
narbonne
via domitia
stormarn
peninsular war
leonard p. zakim
dell, inc.
sonic the hedgehog 3
william henry smith
expressionism
john speed
william camden
ornithine decarboxylase
pennyroyal
chime
eglin air force base
coalition
vanessa lynn williams
samy
nemesis (asimov)
inductees of the rock and roll hall of fame
dolomite
movement never lies
stephen king (publishing of 'the plant')
naseem hamed
sexual roleplaying
dan rather
meiko scientific
the island of the sequined love nun
the lust lizard of melancholy cove
pine cove
absurdist fiction
c.d. payne