Fundamentals of XML and BSML

Table2.3 Main attributes of the Sequence element

Attribute Name Description comment Usually used to indicate a displayable description of the sequence record. See also the title attribute db-source Used to identify a public database, such as GenBank, EMBL, or the DNA Database of Japan (DDBJ). See also the ic-acckey ic-acckey An accession number used to uniquely identify a sequence record within the international consortium of nucleotide sequence databases. The consortium consists of GenBank, the EMBL Nucleotide Sequence Database, and the DNA Database of Japan (DDBJ). This attribute is usually used in conjunction with the db-source attribute. length Indicates the length of the sequence local-acckey An accession number used to uniquely identify a sequence record within a local or private database molecule Indicates the type of molecule represented. Options include: "dna,""rna," "aa" (amino acid), "na" (nucleic acid),

"other-mol,''and "mol-not-set." If you do not specify a molecule attribute, it defaults to "dna'' title A displayable name for the sequence record. See also the comment attribute topology Specifies the topology of the sequence. Usually indicated with the values "linear" or "circular''

In this case, we are defining a new sequence for the same SARS virus as Listing 2.1, but specifying that the actual sequence data is stored in an external text file.

When using the Seq-data element, you must stick to IUPAC codes for nucleic acids and amino acids (see Appendix A). However, the data can include white space characters and numbers. For example, the following document excerpt is considered valid:

<Definitions> <Sequences>

<Sequence id="AY278741" length="29727"> <Seq-data>

1 atattaggtt tttacctacc caggaaaagc caaccaacct cgatctcttg tagatctgtt 61 ctctaaacga actttaaaat ctgtgtagct gtcgctcggc tgcatgccta gtgcacctac 121 gcagtataaa caataataaa ttttactgtc gttgacaaga aacgagtaac tcgtccctct 181 tctgcagact gcttacggtt tcgtccgtgt tgcagtcgat catcagcata cctaggtttc [For brevity, sequence is truncated.]

</Seq-data> </Sequence> </Sequences> </Definitions> </Bsml>

Each Sequence element can include a number of attributes. The main attributes are defined in Table 2.3. A more complete example of the SARS virus, along with more fully detailed attributes, is also provided in Listing 2.2.

