Searching in Indices

   

Introduction

   
  Probably the simplest form of the SRS query language syntax is that used for simple searches in indices. Index searches include searches for simple strings, searches for numbers and ranges of numbers, as well as searches for dates. This section covers the various forms of index search.

General Syntax

   
  An index search must specify within square brackets: the databank or databank group name, the index or index group name, and a search expression. The two names must be separated by a hyphen (-), and be separated from the search expression by, either a colon (:), (string search, see section 8.2.3 "Search Strings"), or a hash (#), (range search). Range searches can be performed only in indices of the types num and real (see section 8.2.4 "Searching Using Numerical Ranges" and section 8.2.5 "Searching for Dates").

Either the field name (e.g., description) or its abbreviation (des) can be used as the index name. All strings, including the search words, are case-insensitive. For example:

[pir-des:elastase] 
searches for the string "elastase" in the des (description) field of the protein databank, PIR.

Search Strings

   
  A search string may be a single search word, or several words separated by logical operator(s) (see section 8.3.4 "Operators"). Parentheses may be used to create a group which will be treated as a single operand (see Example Search strings). Wildcards and regular expressions may also be used (see section "Wildcards" and section "Regular Expressions").

Example 8.1 Search strings

To search the keywords field of the EMBL databank for "insulin" you might enter:

[embl-key:insulin]
To search the description field of the EMBL databank for entries which include "acetylchol*" and "receptor", but remove any entries that contain "muscarinic" you might enter:

[embl-des:(acetylchol*&receptor)!muscarinic]
To search the authors index field of the SWISS-PROT databank to look for entries containing "sanger,f*" but not "coulson,a*", you might use a query like:

[swissprot-aut:sanger,f*!coulson,a*]

Wildcards

Wildcards are useful if, for example, you wish to search for a group of words (e.g., all words starting with "cell" and ending with "ase") or if it is unclear how a word is spelt in a databank.

SRS uses two types of wildcard:

* Matches zero or more characters of any value.
? Matches one character of any value.

Any number of wildcards can be placed anywhere in a search word.

Note:Placing a wildcard at the start of a word or string may increase the response time because all words in the index have to be checked against your string.

Regular Expressions

In addition to the use of wildcards, it is also possible to enter regular expressions directly. Regular expressions must appear within forward slashes (/).

Some characters (^$.[]()*+?) have a special meaning these must be prefixed with a backslash (\) to indicate that the specified character is to be matched literally.

Tables and respectively, list typical regular operands and examples of their use.

Table 8.2 Examples of regular expressions.
Expression
Meaning
/^j..$/
This expression finds all three-character strings that start with j.
/^5[0-9][0-9][0-9]$/
This expression finds all four-digit numbers that start with 5.
/^nif[a-e]$/ This expression finds the gene names nifa, nifb, nifc, nifd, nife.
/^mue?ller$/ This expression finds both muller and mueller.

Note:Searches with regular expressions are sometimes slow since all the words in the index have to be searched.

Searching Using Numerical Ranges

   
  In a numerical index (whether it contains integers or reals) it is possible to search numerical ranges. A numerical index is only possible where there is a one-to-one relationship between entry and value (e.g., sequence length, creation date, resolution).

A range can be specified using a single value or by two values separated by a colon (:). The value on the left must be smaller than the value on the right. To exclude a value from the range, put an exclamation mark (!), in front of it. The absence of a number on the left indicates that the search should start at the minimum value in the index. Similarly, an absent value on the right indicates that the search should include values up to the maximum for that index.

Table 8.3 Examples of queries on an index of the sequence length.
Written Range
Meaning
400 All sequences with a length of exactly 400.
400:500 All sequences with lengths between 400 and 500.
400: All sequences with lengths greater than 400.
:500 All sequences with lengths less than 500.
400:!500 All sequences with lengths between 400 and 500, excluding 500.
: A range from the minimum value to the maximum value, i.e., all sequences.

Combining Ranges

Ranges can be combined using logical operators. For instance either:

300:!500 | !600:700 
or

300:700 ! 500:600 
would retrieve the same set of sequences, i.e., all sequences from 300 to 500, excluding 500, and all sequences from 600 to 700, excluding 600.

Searching for Dates

   
  Searches for dates can be made using one of the two special formats recognized by the SRS query language. These are: YYYYMMDD or DD-MMM-YYYY. For example,

20020619
19-Jun-2002
Dates can be used within ranges in the same way as other numbers. For example,

[swissprot-date#20010415:20020414]
[swissprot-date#15-APR-2001:14-APR-2002]

Searching Multiple Databanks

   
  As well as allowing you to search a field of a single databank, the SRS query language allows you to search multiple databanks in a single query expression. This is done using a list of databank names, enclosed in curly brackets, to replace the single databank name seen in earlier examples. The names in the list must be separated by spaces. For example:

[{swissprot swissnew sptrembl}-des:kinase]
searches for the word, "kinase", in the Description index of the SWISS-PROT, SWISSNEW and SPtrEMBL databanks.

It is often convenient to give a name to a group of databanks so that that name can be used later in the query rather than repeating the list of names. For instance:

[dbs={swissprot swissnew sptrembl}-des:kinase]
&[dbs-org:human]
creates the group, dbs, which combines the three databanks SWISS-PROT, SWISSNEW and SPtrEMBL, and then uses the group name, dbs, to replace the search name in the second part of the search.

Note:It is better not to include spaces and other special characters in names as some systems do not handle them properly. Use an underscore, or start new words with a capital letter instead.