![]() |
|||||||||||||||||
Searching in Indices
|
|||||||||||||||||
Introduction |
|||||||||||||||||
Probably the simplest form of the SRS query language syntax is that used for simple searches in indices. Index searches include searches for simple strings, searches for numbers and ranges of numbers, as well as searches for dates. This section covers the various forms of index search.
|
|||||||||||||||||
General Syntax |
|||||||||||||||||
An index search must specify within square brackets: the databank or databank group name, the index or index group name, and a search expression. The two names must be separated by a hyphen (-), and be separated from the search expression by, either a colon (:), (string search, see section 8.2.3 "Search Strings"), or a hash (#), (range search). Range searches can be performed only in indices of the types num and real (see section 8.2.4 "Searching Using Numerical Ranges" and section 8.2.5 "Searching for Dates"). Either the field name (e.g., description) or its abbreviation (des) can be used as the index name. All strings, including the search words, are case-insensitive. For example:
searches for the string "elastase" in the des (description) field of the protein databank, PIR.
|
|||||||||||||||||
Search Strings |
|||||||||||||||||
A search string may be a single search word, or several words separated by logical operator(s) (see section 8.3.4 "Operators"). Parentheses may be used to create a group which will be treated as a single operand (see Example Search strings). Wildcards and regular expressions may also be used (see section "Wildcards" and section "Regular Expressions"). To search the keywords field of the EMBL databank for "insulin" you might enter:
To search the description field of the EMBL databank for entries which include "acetylchol*" and "receptor", but remove any entries that contain "muscarinic" you might enter:
To search the authors index field of the SWISS-PROT databank to look for entries containing "sanger,f*" but not "coulson,a*", you might use a query like:
WildcardsWildcards are useful if, for example, you wish to search for a group of words (e.g., all words starting with "cell" and ending with "ase") or if it is unclear how a word is spelt in a databank. SRS uses two types of wildcard:
Any number of wildcards can be placed anywhere in a search word. Note:Placing a wildcard at the start of a word or string may increase the response time because all words in the index have to be checked against your string.
Regular ExpressionsIn addition to the use of wildcards, it is also possible to enter regular expressions directly. Regular expressions must appear within forward slashes (/). Some characters (^$.[]()*+?) have a special meaning these must be prefixed with a backslash (\) to indicate that the specified character is to be matched literally. Tables and respectively, list typical regular operands and examples of their use. Note:Searches with regular expressions are sometimes slow since all the words in the index have to be searched.
|
|||||||||||||||||
Searching Using Numerical Ranges |
|||||||||||||||||
In a numerical index (whether it contains integers or reals) it is possible to search numerical ranges. A numerical index is only possible where there is a one-to-one relationship between entry and value (e.g., sequence length, creation date, resolution). A range can be specified using a single value or by two values separated by a colon (:). The value on the left must be smaller than the value on the right. To exclude a value from the range, put an exclamation mark (!), in front of it. The absence of a number on the left indicates that the search should start at the minimum value in the index. Similarly, an absent value on the right indicates that the search should include values up to the maximum for that index.
Combining RangesRanges can be combined using logical operators. For instance either:
or
would retrieve the same set of sequences, i.e., all sequences from 300 to 500, excluding 500, and all sequences from 600 to 700, excluding 600.
|
|||||||||||||||||
Searching for Dates |
|||||||||||||||||
Searches for dates can be made using one of the two special formats recognized by the SRS query language. These are: YYYYMMDD or DD-MMM-YYYY. For example,
Dates can be used within ranges in the same way as other numbers. For example,
|
|||||||||||||||||
Searching Multiple Databanks |
|||||||||||||||||
As well as allowing you to search a field of a single databank, the SRS query language allows you to search multiple databanks in a single query expression. This is done using a list of databank names, enclosed in curly brackets, to replace the single databank name seen in earlier examples. The names in the list must be separated by spaces. For example:
searches for the word, "kinase", in the Description index of the SWISS-PROT, SWISSNEW and SPtrEMBL databanks. It is often convenient to give a name to a group of databanks so that that name can be used later in the query rather than repeating the list of names. For instance:
creates the group, dbs, which combines the three databanks SWISS-PROT, SWISSNEW and SPtrEMBL, and then uses the group name, dbs, to replace the search name in the second part of the search. Note:It is better not to include spaces and other special characters in names as some systems do not handle them properly. Use an underscore, or start new words with a capital letter instead.
|
|||||||||||||||||
![]() |