Exporting Data to a Generic XML Format

   
  Once a set of data has been loaded into a set of SRS objects, it can be written out in a generic XML format using a few simple commands. This operation can be performed either from the UNIX command line (using an Icarus script) or from within the SRS browser. This section explains how an Icarus script may be used to perform conversions to generic XML formats. Section "Exporting Data to XML Formats Using the SRS Browser" explains how the generic format conversion process can be performed using the SRS browser. Example 19.1 shows an Icarus script (swissprot2xml.i) that:

  • Creates a DTD from a simple SwissProt loader (SeqSimple).
  • Runs a query and stores the resulting entry in an SRS object.
  • Write the object to a well-formed XML document with an internal DTD subset.
Example 19.2 shows the SeqSimple loader, which loads data from three fields (Accession, Description and SeqLength).

Example 19.1 An Icarus script (swissprot2xml.i) used to export SwissProt data to XML format.

#!/usr/bin/env icarus
# Icarus script to demonstrate exporting data from a flat file 
# library to a generic XML format based on the SeqSimple loader.

# Open a session and create a SwissProt loader object.
$sess=$Session:[]
$loader=$sess.getLoaderNamed:SeqSimple

# Generate a DTD using the selected loader object.
$dtd=$loader.getDTD

# Run a query in the ID field of swissprotrelease and load the entry 
# using the SeqSimple loader.
$set=$sess.query:"[swissprotrelease-id:'100K_RAT']"
$loadedObj=$set.load:SeqSimple

# Invoke the ToXml function to convert the loaded object to an XML
# format.
$xmlObj = $ToXml:$loadedObj

# Open an output file to store the results.
$fh=$File:['swissprot2xml.xml' mode:write]

# Write the XML declaration and a commented out DOCTYPE declaration. 
# To activate the internal DTD subset, simply remove the XML comment 
# tags '<!--' and '-->'.
$fh.app:|<?xml version="1.0" encoding="ISO-8859-1"?>
        |
        |<!--
        |<!DOCTYPE LoadedSet [
        |

# Write the DTD declarations into the internal subset.
$fh.app:$dtd

# Close the DOCTYPE declaration and write the opening tag for the 
# root element.
$fh.app:|]>
        |-->
        |
        |<LoadedSet>

# Write the XML formatted object into the output file.
$fh.app:$xmlObj

# Write the closing tag for the root element.
$fh.app:|
        |</LoadedSet>
        |

# Close the output file.
$fh.close
Example 19.2 The SeqSimple loader from $SRSDB/loader.i.

$SeqSimple_Class=$LoadClass:[SeqSimple
  groups:{$SEQUENCE_LIBS $SEQUENCESUB_LIBS}
  attrs:{
    $LoadAttr:[Accession type:string
        load:$Tok:[field:$DF_Accession]]
    $LoadAttr:[Description type:string
        load:$Tok:[field:$DF_Description]]
    $LoadAttr:[SeqLength type:int 
        load:$Tok:[field:$DF_SeqLength]]
  }
]
Example 19.3 shows the XML document (swissprot2xml.xml) generated by the script, swissprot2xml.xml. The ToXml function creates a separate XML element for each of the three fields identified in the loader. It also create an XML element for the implicit Id field. The name for each element is taken from the name of the corresponding $LoadAttr object. This output is identical to the output that would be produced by saving the data from within the SRS browser, using the Generic XML format radio button on the DownLoad Options page (see Figure 19.1 ).

Note: The output incorporates an internal DTD subset that is commented-out. Deactivating the DTD in this way allows XML viewers to display the XML data without the namespace attributes. If you want to use the DTD to validate the document, simply remove the comment delimiters (<!-- and -->) surrounding the DOCTYPE block.

Example 19.3 XML output file (swissprot2xml.xml) generated by the script in Example 19.1

<?xml version="1.0" encoding="ISO-8859-1"?>

<!--
<!DOCTYPE LoadedSet [

<!ELEMENT LoadedSet (SeqSimple*)>
<!ELEMENT SeqSimple (SeqSimple:Id, SeqSimple:Accession?,
                     SeqSimple:Description?)>
<!ATTLIST SeqSimple
          SeqLength CDATA #IMPLIED
>
<!ELEMENT SeqSimple:Id (#PCDATA)>
<!ATTLIST SeqSimple:Id
          xmlns:SeqSimple CDATA #FIXED "http://.../srs/SeqSimple"
>
<!ELEMENT SeqSimple:Accession (#PCDATA)>
<!ATTLIST SeqSimple:Accession
          xmlns:SeqSimple CDATA #FIXED "http://.../srs/SeqSimple"
>
<!ELEMENT SeqSimple:Description (#PCDATA)>
<!ATTLIST SeqSimple:Description
          xmlns:SeqSimple CDATA #FIXED "http://.../srs/SeqSimple" 
>

]>
-->

<LoadedSet>

<SeqSimple xmlns:SeqSimple="http://.../srs/SeqSimple" >
  <SeqSimple:Id>SWISSPROTRELEASE:100K_RAT</SeqSimple:Id>
  <SeqSimple:Accession>Q62671</SeqSimple:Accession>
  <SeqSimple:Description>100 kDa protein (EC 6.3.2.-).
  </SeqSimple:Description>
  <SeqSimple:SeqLength>889</SeqSimple:SeqLength>
</SeqSimple>

</LoadedSet>