Fasta Format

In bioinformatics, FASTA format is a file format used to exchange information between genetic sequence databases. Its format looks like this:
 >SEQUENCE_1 
comment line 1(optional)
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL >SEQUENCE_2
comment line 1(optional)
comment line 2 (optional)
SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH
It consists of a header line (beginning with a '>') which gives a name and/or a unique identifier for the sequence, and often lots of other information too. Many different sequence databases use standarized headers, which helps when automatically extracting information from the header. After the header line, one or more comments, distinguished by a semi-colon at the beginning of the line, may occur. Most databases and bioinformatics applications do not recognize these comments so their use is discouraged, but they are part of the official format. After the header line and comments, one or more sequence lines may follow. Sequences may be protein sequences or DNA sequences, they can be of any length and can contain gaps or alignment characters (see sequence alignment). FASTA format files often have file extensions like .fa, .mpfa or .fsa (and probably many more!). The simple format of FASTA files makes them easy to manipulate using text processing tools and scripting languages like Perl. The NCBI have gone so far as to define a standard for their fasta header (although generally this is a bit messy)...
   GenBank                           gi|gi-number|gb|accession|locus   EMBL Data Library                 gi|gi-number|emb|accession|locus   DDBJ, DNA Database of Japan       gi|gi-number|dbj|accession|locus   NBRF PIR                          pir||entry   Protein Research Foundation       prf||name   SWISS-PROT                        sp|accession|entry name   Brookhaven Protein Data Bank      pdb|entry|chain   Patents                           pat|country|number    GenInfo Backbone Id               bbs|number    General database identifier	    gnl|database|identifier   NCBI Reference Sequence           ref|accession|locus   Local Sequence identifier         lcl|identifier 
External Links this

 

<< PreviousWord BrowserNext >>
medical software
oxisol
list of turks
richard assheton cross, 1st viscount cross
stephen daedalus
list of leaders of russia
list of east timorese people
goattracker
film at 11
sopwith pup
xplane
list of named ethnic enclaves in north american cities
professional wrestling throws
battle of paris
zupan
earl of verulam
zupa
jake
earl jellicoe
red tape
australian legislative election, 1996
fritillaria meleagris
iso 9241
misumi, kumamoto
shiranuhi, kumamoto
uto district, kumamoto
jonan, kumamoto
tomiai, kumamoto
matsubase, kumamoto
ogawa, kumamoto
sandra magnus
toyono, kumamoto
chuo, kumamoto
tomochi, kumamoto
shimomashiki district, kumamoto
taimei, kumamoto
piers sellers
yokoshima, kumamoto
tensui, kumamoto
gyokuto, kumamoto
kikusui, kumamoto
mikawa, kumamoto
pamela melroy
nankan, kumamoto