|
Example 19.5 shows some of the $XMLPrintMetaphor objects from the final edited version of the game_printer.i file, which may be used to transform EMBL, Genbank, and Swissprot entries to GAME format. A full listing of this file is provided in Appendix II, Example II.5 The procedure for creating and using these $XMLPrintMetaphor objects to perform data format conversions is as follows.
- Select a target XML format, locate its DTD, and navigate to the directory in which the DTD resides. In this example, the target XML format is GAME, and the GAME DTD consists of a single file called game.dtd.
- Use the dtd2ica utility to create templates for the $XMLPrintMetaphor objects by running the command:
dtd2ica dtd game.dtd lib game printer |
The output file will be called game_printer.i, and it will be written to the $SRSSITE directory. The file name is derived from the value of the -lib command line argument. It is worth checking the $SRSSITE directory before running the utility to make sure the new print metaphor file template will not overwrite an existing file of the same name.- Edit the $XMLPrintMetaphor object for the element that delimits GAME entries, which in this example is the game element. To the $XMLPrintMetaphor object for this element, add the following attributes:
-
id:"game"
This identifies the game element as the entry-delimiting element, and the starting point for processing all of the other $XMLPrintMetaphor objects, which are descendants of the game element.
-
sourceLoader:{$EmblEntry_Class $GenbankEntry_Class $SwissEntry_Class}
This identifies a list of source data loaders which may be used to retrieve data for conversion to GAME format.
Note: This is the only $XMLPrintMetaphor object that will have an id attribute. Other $XMLPrintMetaphor objects may have a sourceLoader attribute, but its purpose will be to exclude certain types of loaded objects which do not have data sources for the corresponding element. For example, the $GAME_dbxref metaphor in Example 19.4 uses a sourceLoader attribute to indicate that the Genbank library does not have a database cross reference field that can supply data to dbxref elements in the GAME format. It does this by excluding the Genbank loader ($GenbankEntry_Class) from the list.
Example 19.4 $XMLPrintMetaphor object declarations for transforming EMBL, Genbank, and SwissProt data to GAME format.
$GAME_game = $XMLPrintMetaphor:["game"
id:"game"
sourceLoader:{$EmblEntry_Class
$GenbankEntry_Class $SwissEntry_Class}
attributes:{
...
$XMLPrintAttr:["taxon"
from:{
$XMLPrintContent:[objattrname:"OrgC"
sourceLoader:{$EmblEntry_Class $SwissEntry_Class}]
$XMLPrintContent:[objattrname:"Org"
sourceLoader:{$GenbankEntry_Class}]
}
]
}
children:{
. . .
$GAME_feature_set
$GAME_seq
}
]
$GAME_DOCUMENTATION_author = $XMLPrintMetaphor:["author"
from:{
$XMLPrintContent:[objattrname:"references/authors"
sourceLoader:{$EmblEntry_Class
$GenbankEntry_Class $SwissEntry_Class}]
}
]
$GAME_seq = $XMLPrintMetaphor:["seq"
attributes:{
$XMLPrintAttr:["length"
from:{
$XMLPrintContent:[objattrname:"seqLen"
sourceLoader:{$EmblEntry_Class
$GenbankEntry_Class $SwissEntry_Class}]
}
]
...
}
children:{
$GAME_dbxref
$GAME_description
$GAME_residues
}
]
$GAME_dbxref = $XMLPrintMetaphor:["dbxref"
sourceLoader:{$EmblEntry_Class $SwissEntry_Class}
card:multi
children:{
$GAME_DOCUMENTATION_xref_db
$GAME_DOCUMENTATION_db_xref_id
}
selectors:{
$XMLPrintSelector:[path:'links']
}
]
$GAME_description = $XMLPrintMetaphor:["description"
from:{
$XMLPrintContent:[objattrname:"Des"
sourceLoader:{$EmblEntry_Class $GenbankEntry_Class}]
$XMLPrintContent:[objattrname:"description"
sourceLoader:{$SwissEntry_Class}]
}
]
$GAME_residues = $XMLPrintMetaphor:["residues"
from:{
$XMLPrintContent:[objattrname:"seq/seq"
sourceLoader:{$EmblEntry_Class
$GenbankEntry_Class $SwissEntry_Class}]
}
]
|
- Activate all GAME elements and attributes that will receive data from the loaded objects.
-
To activate an XML element, uncomment the from list and edit the $XMLPrintContent object to create an association between a specific field in the loaded object and the selected GAME element. To create the association, you must first identify the loader which will be used to retrieve the data and add its path to the sourceLoader attribute. For example, to configure a metaphor to retrieve data from a loaded EMBL entry, replace the four question marks (????) in the sourceLoader attribute value with the path of the main EMBL loader, $EmblEntry_Class. If more than one loader can be used with this $XMLPrintContent object, make the sourceLoader attribute into a list and put all of the loader paths inside a pair of curly brackets ({}). You must then identify the particular $LoadAttr object within the $EmblEntry_Class loader that loads the EMBL data to be placed in the element, and assign the name of this $LoadAttr object to the objattrname attribute of the $XMLPrintContent object.
For the description element, the $LoadAttr object is called Des. For the author element, the $LoadAttr object is called authors, but this $LoadAttr object belongs to the subentry loader called EmblRef. In this case, the value of the objattrname attribute must be a path leading from the main EMBL loader to the specific $LoadAttr object that loads the data, i.e. "references/authors".
For the residues element, the $LoadAttr object is called seq, and it is contained in the main $EmblEntry_Class loader. However, the seq $LoadAttr object loads a $Sequence object. The sequence is contained in the seq attribute of this $Sequence object, so the path that must be supplied to the objattrname attribute is "seq/seq". The first seq in this path refers to the $LoadAttr object of the $EmblEntry_Class loader. The second seq refers to the loaded seq attribute of the $Sequence object, which contains the sequence.
-
To activate an XML attribute, locate the $XMLPrintMetaphor object for the element that contains the attribute and uncomment the $XMLPrintAttr object for the target attribute. Within this $XMLPrintAttr object, create an association between the EMBL field that will supply the data and the selected GAME attribute using the same procedure described above.
- Save the edited game_printer.i file.
- Add this new file to the list of files which are included via the site.i file in the $SRSSITE directory. For the GAME example, the following line was added to the site.i file:
file:"SRSSITE:game_printer.i" |
- Save the edited site.i file and incorporate the new $XMLPrintMetaphor objects into your SRS installation by running srssection.
- To convert a set of EMBL entries to GAME format, write a simple Icarus script to do the following:
-
Query EMBL and retrieve the entries.
-
Load the entries into an OM object using the $EmblEntry_Class loader.
-
Invoke the Icarus ToXml function with two parameters, the loaded object and the ID of the set of $XMLPrintMetaphor objects for the GAME format. The output will be an XML stream that can be piped into a file.
Example 19.5 shows an Icarus script that loads a single EMBL entry and writes it to the GAME format. The value of the xmlPrintMetaphor attribute in the ToXml function is set to the same value as the id attribute of the $XMLPrintMetaphor object for the entry-delimiting element (game). Example 19.6 shows the output produced by the script listed in Example 19.5 , which is a single EMBL entry with ID E48966 converted to the GAME format.
Example 19.5 Icarus script to load a single EMBL entry and export it to the GAME XML format.
# Write a single EMBL entry to GAME format.
$s=$Session:[]
$set=$s.query:'[emblrelease-id:E48966]'
$Print:|<?xml version="1.0" encoding="ISO-8859-1"?>
|
|<LoadedSet>
$obj = $set.load:EmblEntry
$Print:$ToXml:[$obj xmlPrintMetaphor:game]
$Print:"</LoadedSet>\n\n"
|
Example 19.6 Sample of GAME-formatted XML output from EMBL entry #E48966.
<?xml version="1.0" encoding="ISO-8859-1"?>
<LoadedSet>
<game taxon="artificial sequences.">
<feature_set>
<description>Identification and detection and monitoring
method of ray fungus by 16SrRNA gene.</description>
<name>E48966</name>
<type>DNA</type>
<annotation_source>"Identification and detection and
monitoring method of ray fungus by 16SrRNA gene";
</annotation_source>
<comments>OS Artificial Sequence
PN JP 2001178498-A/1
PD 03-JUL-2001
PF 27-DEC-1999 JP 1999371257
PI ASAKA SUZUKI,TOSHIHIRO HOAKI
PC C12Q1/68//C12N15/09,C12N15/00
CC Description of Artificial Sequence:
CC Designed Oligonucleotide that
CC specifically hybridizes to 16S rRNA genes
CC of Actinomycetes
FH Key Location/Qualifiers
</comments>
<author>Suzuki,A.Hoaki,T.</author>
<creation_date>05-SEP-2002</creation_date>
<version>E48966.1</version>
<feature_span>
<type>source</type>
<comments type="db_xref">taxon:32630</comments>
<comments type="organism">synthetic construct</comments>
<start>1</start>
<end>17</end>
</feature_span>
</feature_set>
<seq length="17" id="E48966" type="DNA">
<description>Identification and detection and monitoring
method of ray fungus by 16SrRNA gene.</description>
<residues>cgcggcctatcagcttg</residues>
</seq>
</game>
</LoadedSet>
|
|
|
Attributes used by $XMLPrintMetaphor Objects
The $XMLPrintMetaphor object contains all the information required to retrieve data from a loaded OM object and write it in a specific XML format. It has the following attributes.
-
elementName is the name of the XML element to which the metaphor writes data.
-
card specifies the cardinality of the XML element. Use mono for an element that can occur only once as a child of a given parent element. Use multi for an element that can occur multiple times as a child of a given parent.
-
id is the unique ID for a particular family of $XMLPrintMetaphor objects. This attribute should be included only in the 'root' metaphor (i.e. the metaphor for the entry delimiting element).
-
sourceLoader contains a list of loaders for which the $XMLPrintMetaphor object is valid. This attribute list should only appear in the 'root' metaphor, where its purpose is to specify all possible loaders that can be used with this family of $XMLPrintMetaphor objects. A sourceLoader attribute list may also be used in a child metaphor, but its purpose is then to exclude that metaphor from being processed for certain types of loaded objects (see section "Procedure for Creating and Using $XMLPrintMetaphor Objects").
Note: If no sourceLoader attribute is included in the 'root' metaphor, all of the metaphors in the family are treated as 'generic' metaphors, and all sourceLoader attribute lists in this family of metaphors will be ignored (i.e. no loader validation will be performed).
-
attributes lists the $XMLPrintAttr objects, each of which contains all the information required to retrieve data from the loaded OM object and write it as an attribute value for the current element.
-
children contains a list of the $XMLPrintMetaphor objects that represent the child elements belonging to the current element.
-
from contains a list of $XMLPrintContent objects, each of which specifies a possible data source for the current element. The data source specification consists of a loader path (identified in the sourceLoader attribute list) and a $LoadAttr name (identified in the objattrname attribute).
-
selectors contains a list of $XMLPrintSelector objects, each of which defines a special mechanism for identifying and processing a data source within the loaded OM object.
-
useEmptyElementSyntax is a boolean attribute that should only be used in the 'root' metaphor. If it is set to y, all elements that have attributes but no content will be written to the output stream using empty element syntax.
-
showSpecialCharacterWarnings is a boolean attribute that should only be used in the 'root' metaphor. If it is set to y, a warning will be printed to standard error whenever any of the special characters [<>'&"] is detected. If both apostrophes ['] and quotation marks ["] are detected in data that is being fed to an XML attribute, the quotation marks will be replaced with the predefined general entity reference " to eliminate conflicts with the quotation marks used to wrap the attribute value.
Attributes used by $XMLPrintContent Objects
The $XMLPrintContent object is used to identify possible data sources for both XML elements and attributes, and is always encapsulated in a from list.
-
objattrname specifies the name of a particular $LoadAttr field from which data should be retrieved. A path consisting of a sequence of $LoadAttr names separated by forward slashes (/) may also be used. For example, objattrname:"features/qualifier/value" could be used to retrieve data from the $LoadAttr named value, which belongs to the subentry loader named qualifier . This loader is a composition loader within the subentry loader named features, which in turn is a composition loader within the main EmblEntry loader.
-
sourceLoader contains a list of loaders for which the $XMLPrintContent object is valid.
Attributes used by $XMLPrintAttr Objects
Each $XMLPrintAttr object specifies an XML attribute for the current element and identifies data sources for the attribute's value.
-
attributeName gives the name of the XML attribute into which data will be written.
-
from contains a list of $XMLPrintContent objects, each of which specifies a possible data source for the current attribute.
Attributes used by $XMLPrintSelector Objects
The $XMLPrintSelector object provides two special mechanisms for selecting and retrieving data from the loaded OM object. By using a path attribute, you can identify a specific path inside the OM object from which the current element and all its descendants will retrieve data. By using a split attribute, you can create multiple instances of an element, each of which contains a substring derived from the original sources string in the OM object by splitting the source string using the set of characters specified in a regular expression.
-
path is used to specify a path consisting of either a single $LoadAttr name or a sequence of $LoadAttr names separated by forward slashes (/). This path will be automatically prepended to the values of all objattrname attributes in the metaphor. This is demonstrated in Example 19.7 The data for the id attribute of the feature_span element will be retrieved from the features/location path (i.e. from the $LoadAttr named location in the features subentry loader). Using a path attribute also has important consequences for the children of the current metaphor. Notice that the $GAME_DOCUMENTATION_feature_span metaphor has multiple cardinality. This means that multiple instances of the feature_span element and its children will be created, one for each feature subentry in the OM object. When this metaphor is processed, SRS traverses the tree structure of the OM object looking for features objects. Each time it finds one, it creates a new feature_span element, fetches the location data, and writes it to an id attribute. It also passes this features object to all of its children. Each child metaphor therefore receives a subset of the original OM object consisting of a single features object (i.e. not the whole object). The child metaphors can only retrieve data from this features object. This mechanism ensures that the data in the child elements comes from the same subentry that the parent metaphor has retrieved data from. It is also important to note that the features path gets passed down to all of the child metaphors. The inherited path gets prepended automatically to all of the objattrname attibutes in each child metaphor, just as it was prepended to those of the parent metaphor.
Example 19.7 Using an $XMLPrintSelector object to create a set of subentry elements.
<?xml version="1.0" encoding="ISO-8859-1"?>
<LoadedSet>
<game taxon="artificial sequences.">
<feature_set>
<description>Identification and detection and monitoring
method of ray fungus by 16SrRNA gene.</description>
<name>E48966</name>
<type>DNA</type>
<annotation_source>"Identification and detection and
monitoring method of ray fungus by 16SrRNA gene";
</annotation_source>
<comments>OS Artificial Sequence
PN JP 2001178498-A/1
PD 03-JUL-2001
PF 27-DEC-1999 JP 1999371257
PI ASAKA SUZUKI,TOSHIHIRO HOAKI
PC C12Q1/68//C12N15/09,C12N15/00
CC Description of Artificial Sequence:
CC Designed Oligonucleotide that
CC specifically hybridizes to 16S rRNA genes
CC of Actinomycetes
FH Key Location/Qualifiers
</comments>
<author>Suzuki,A.Hoaki,T.</author>
<creation_date>05-SEP-2002</creation_date>
<version>E48966.1</version>
<feature_span>
<type>source</type>
<comments type="db_xref">taxon:32630</comments>
<comments type="organism">synthetic construct</comments>
<start>1</start>
<end>17</end>
</feature_span>
</feature_set>
<seq length="17" id="E48966" type="DNA">
<description>Identification and detection and monitoring
method of ray fungus by 16SrRNA gene.</description>
<residues>cgcggcctatcagcttg</residues>
</seq>
</game>
</LoadedSet>
|
-
split is used to split a source string from the OM object into substrings, using a set of characters specified in a regular expression. If the current metaphor has multiple cardinality, each substring will be written into a separate instance of the element it represents.
-
sourceLoader contains a list of loaders for which the $XMLPrintSelector object is valid.
|