FASTA format orignally derives from the FASTA sequence database searching package. FASTA format simply consists of one line of comments beginning with a '>' symbol, followed by any number of lines, of any length, of sequence information. Lines except the last are often limited to 60 characters. Sequences are expected to be represented in the standard amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; N may be used for an unknown nucleic acid residue or X for an unknown amino acid residue. Some sequences in this format might look like:
>18BI1 Human MLC1emb gene for embryonic myosin alkaline light chain, promoter and exon 1 GTGAAGAGAGAGCTGTGGCATGAAGGGGAGGGGGCTGGTGGCCCCAAACCTGGTGACAA TACACAGTTGTCAGCTGTACCCTGCTGGCGTTTCTTCCTTTTATAGTCAGCAGCAGTTG CTCTTGCTTTCACCCAGCCCCTCTGTGGGGCTCCTGCCCAGGATAAAAGGGAAGGGAGG CAGCCCAGGCTCCTATCTCATCTCCCAGACGCCACGTCTCTCGGTTTCTTCTTAG >32A5UTR Human alpha-Bcrystallin gene, 5' end GTCGACACCACCCAAAATAGTGCCGAGCCTCTTGGGGGGGGAGGGGCTGGGAGTGGGGG CCCTGAGTGAGAGCAACGAGGGTGTGACCAGCGCCGCCCGGACCCCTAGTCCCCTCCCC CGCACACTCTTCAGCTGTCGCAGGGGGCCTGAGAGGACAGCTGAGGGTCCTGGCTGGGA ACGAGCTGGGGAGGGGGAGCTGGTGGTGCCTGGGGCATGAAGAGGCCTCGCTGAGACCC TCACAAACGGTTTGCACGTTTCCACACCTCATTTTCTCCTCTTCGGTGGCAGGCACTGT GCACCCAATTCCTAAAGCACTCCTGGATTTAATGTTCTGAGAGCCACATAGAACGAAAG ATGCAAGAAATCTGTTTGCTCTTTTTTCAGGGGGTGGGGTCTTTCTGCCCAGATGTGGG ATCCTCTCCTAAACCCAGGTCAACCCAGGGCACGAGGCAGATGGCTGGTGCTGACATGT TGACCATCACTGCTCTCTTCCAAGGACTCACAAAGAGTTAATGTCCCTGGGGCTCAGCC TAGGAAGATTCCAGTCCCTGCCCAGGCCCAAGATAGTTGCTGGCCTGATTCCCCTGGCA TTCAGGACTGGAAAGGAGGAGGAGGGGCACACTACGCCGGCTCCCATCCTCCCCCCACC CCGCGTGCCTGCTTGGGATTCCTGACTCTGTACCAGCTTCAGAGAACAGGGGTGGGGGT GGGTGCCATTGGGTGTGGACAGAAAGCTAGTGAAACAAGACCATGACAAGTCACTGGCC GGCTCAGACGTGTTTGTGTCTCTCTTTTCTTAGCTCAGTGAGTACTGGGTATGTGTCAC ATTGCCAAATCCCGGATCACAAGTCTCCATGAACTGCTGGTGAGCTAGGATAATAAAAC CCCTGACATCACCATTCCAGAAGCTTCACAAGACTGCATATATAAGGGGCTGGCTGTAG CTGCAGCTGAAGGAGCTGACCAGCCAGCT