Geena 2
Build March 10, 2015.
This system is under active development,
please forgive us for possible errors and
send us your comments, criticisms and congratulations,
if any.

Geena 2 information page

Welcome to Geena 2, the new tool for multi-spectra filtering, averaging and alignment brought to you by researchers from Genoa and Naples (GEnova E NApoli).
The use of Geena 2 should be straightforward. Results are displayed in a simple format. Nevertheless, should you have any problem, see the help page.

You may find it useful to perform a first analysis by using the example input data file Test1.txt that is provided for testing purposes.
In the following, you will find links to output files that were generated by Geena 2 from this input data by using the following parameters:

  • Analysis range: 1,200 - 1,700 m/z - Normalization peak at: 1,420.80 m/z
  • Abundance thresholds: 5 at 1,200 m/z, 4 at 1,700 m/z
  • Maximum number of isotopic replicas: 5 - Maximum delta between isotopic peaks: 0.05 Da
  • Maximum delta for aligning replicates: 0.1 Da - Minimum number of signals in replicates: 1
  • Maximum delta for aligning average spectra: 0.2 Da - Minimum number of signals in average spectra: 2
Some of these values, namely the analysis range and the normalization peak, MUST be inserted in the Geena 2 web interface of use, either the Quick Search Interface (QSI) or the Standard Search Interface (SSI). Remaining values correspond to default values which are implicity used by the QSI and explicity shown by the SSI, where they can be modified.

NB! Although we periodically check that this file is aligned with the version of the software, some differences can arise because Geena 2 is under active development.

Data file formats

Here, the formats that are used by Geena 2 are shortly introduced.


Input data file
The input data file is a simple text file with data delimited by tab characters.
NB! This format can be easily achieved by saving data from MS Excel with the "Text (tab delimited)" format.

Sample section
The basic block of the iput file is the "Sample section" that includes data referring to replicated spectra from the same origin sample. Each spectrum is reported by pairs of m/z and abundance values which are ordered by increasing m/z value and listed in column, as specified below.

First line
The Sample section begins with a line that includes reference names for all spectra for the Sample. Names are separated by two tab characters. The last name is not followed by any tab character.

Second line
The second line includes fixed labels (usually "m/z" and "abund") that are used as headers for the following data and define the contents of the respective columns. This line does not presently affect analysis.

Following lines
Following lines include pairs of m/z and abundance values, each pair representing a peak of the spectrum, in m/z value ascending order. The first number reports the m/z value, the second one the abundance (aka intensity).
On a single line of the file, the number of such pairs equals the number of spectra for the same sample. So, on the i-th line, the (i-2)-th pair of m/z and abundance values for each spectra is included. This means that m/z values in the same line may be (and usually are) different. In general, the pair of values in the k-th spectrum occupies columns 2*k-1 (m/z value) and 2*k (abundance value).
All values, both within and between pairs, are separated by a tab character. Since spectra may have a differente number of peaks, some columns may include more values than the others. Missing values should be replaced by zeroes.

Example of a sample section
The following excerpt of the example input data file Test1.txt shows the initial and final lines of the first sample section. The section refers to three spectra from the same sample, named "20A", "20B" and "20C". The first spectrum has a greater number of peaks than the others. Missing peaks are represented by zeroes.

20A		20B		20C	
m/z	abund	m/z	abund	m/z	abund
707.36	47	707.36	29	707.36	35
708.36	21	709.37	68	709.37	72
709.37	94	710.38	26	710.38	41
710.38	34	711.39	26	711.39	44
711.39	48	713.40	20	713.40	25
713.40	24	723.11	18	723.35	55
723.35	45	723.35	38	739.30	51
724.35	26	725.36	19	767.31	33
725.38	21	738.28	27	803.20	30
739.29	40	739.30	67	804.20	58
741.30	28	740.30	25	805.21	63
763.29	30	741.30	50	806.22	25
...	..	...	..	...	..
...	..	...	..	...	..
2532.90 2       0       0       0       0
2541.57 2       0       0       0       0
2547.73 2       0       0       0       0
2554.99 3       0       0       0       0
2556.06 2       0       0       0       0
2557.02 3       0       0       0       0
2669.06 3       0       0       0       0
2821.16 16      0       0       0       0
2821.79 3       0       0       0       0

Multiple sample sections
At the end of each sample section, but the last one, a line with two backslashes indicates the separation with the following sample section, as shown in the following excerpt of the same example file. E.g.:
...	..	...	..	...	..
...	..	...	..	...	..
2557.02 3       0       0       0       0
2669.06 3       0       0       0       0
2821.16 16      0       0       0       0
2821.79 3       0       0       0       0
\\
21A             21B             21C
m/z     abund   m/z     abund   m/z     abund
707.36  17      702.44  14      707.36  14
709.38  33      707.36  13      709.38  38
711.39  16      709.38  23      710.38  21
723.36  21      710.38  13      711.39  13
729.34  33      711.39  12      713.37  11
730.35  12      713.41  11      723.36  25
...	..	...	..	...	..
...	..	...	..	...	..

Example file Test1.txt
The example file Test1.txt may be downloaded from the Geena 2 web site. This file reports four sample sections, each of which includes three spectra replicates.


Intermediate information on isotopic peaks joining
These files include the results of filtering and joining of isotopic peaks for a spectrum. They are formatted for readability as HTML since they do not include information that can be re-analysed by Geena 2. They are however useful for checking how isotopic peaks were joined. Their analysis can suggest changes in values of input parameters.

These files can be downloaded at the end of the analysis.
Their name is defined as follows:
"<job>_<sample>_<spectrum>_groups.html"
where <job> is the job name given by the researcher, <sample> is the label that is associated by Geena 2 to the sample in the analysis, and <spectrum> is the name of the spectrum in the input file.
These files include first some essential information on the spectrum. After that, m/z and abundance values of "peak groups", i.e. of peaks resulting from filtering and joining of isotopic peaks, are listed.

Summary data
In this part, the name associated to the spectrum is listed together with its total number of peaks, the number of peaks included in the analysis range, the m/z value of the normalization peak (if used) and its overall abundance (resulting from the sum of the abundance of all its isotopic peaks), and, finally, the numer of peak groups that were identified.
Example:
Spectrum name: 20A
There are 202 peaks in the spectrum
There are 76 peaks in the range
Normalization peak was found at 1420.763 m/z
Normalization abundance is 1472
There are 36 peak groups in the range

Peak groups data
This section includes a list of all peak groups identified.
For each group, its m/z and abundance values, i.e. of the m/z value of the base (monoisotopic) peak and the overall abundance associated to that peak (sum of the abundances of all isotopic peaks), are reported.
Morover, the list of isotopic peaks associated to that peak group, each of which with its m/z and abundance values, is reported.
Example:
Peak Group 8, Basic peak 1360.739 m/z, Overall abundance 5.571
--> m/z 1360.739, ab 2.378
--> m/z 1361.724, ab 1.970
--> m/z 1362.730, ab 1.223
Peak Group 12, Basic peak 1403.749 m/z, Overall abundance 7.473
--> m/z 1403.749, ab 2.717
--> m/z 1404.736, ab 2.038
--> m/z 1405.756, ab 1.562
--> m/z 1406.742, ab 1.155

Example files
The following files were generated by Geena 2 from the example input data file. See input parameters above.


Filtered spectra
These files include the results of pre-processing of spectra. They are used as input for the computation of the average spectrum for a given sample.
They are formatted as simple text, but with the defined syntax that is shown below, since they constitute an intermediate result and must be further analysed by Geena 2.
These files can be downloaded at the end of the analysis.
Their name is
"<job>_<sample>_<spectrum>_filtered.txt"
where <job> is the job name given by the researcher, <sample> is the label that is associated by Geena 2 to the sample in the analysis, and <spectrum> is the name of the spectrum in the input file

Spectrum section
The filtered spectrum is shown as a list of pairs of m/z and abundance values which are ordered by increasing m/z value and listed in column, as specified below.

First line
The section begins with a line that includes the reference name for the replicate, preceded by a "#" character.

Following lines
Following lines include pairs of m/z and abundance values, each pair representing a peak of the filtered spectrum. The first number reports the m/z value, the second one the abundance (aka intensity). Values are separated by a tab character.
Last line
At the end of the spectrum, a line with two backslashes is included.

Example

#20A
1360.739	5.571
1403.749	7.473
1420.763	230.163
1432.766	35.054
1442.746	12.636
1522.815	7.201
1524.793	27.582
1536.821	38.315
1640.852	5.299
\\

Example files
The following files were generated by Geena 2 from the example input data file. See input parameters above.


Average spectra
These files include the average spectrum achieved by aligning and averaging all spectra from the same sample. They are used as input for the computation of the average spectrum and the alignment for all samples under anmalysis.
They are formatted as simple text having almost the same format of filtered spectra.
These files can be downloaded at the end of the analysis.
Their name is
"<job>_<sample>_average.txt"
where <job> is the job name given by the researcher, and <sample> is the label that is associated by Geena 2 to the sample in the analysis.

Spectrum section
The filtered spectrum is shown as a list of pairs of m/z and abundance values which are ordered by increasing m/z value and listed in column, as specified below.

First line
The section begins with a line that includes the label that is associated by Geena 2 to the sample in the analysis, preceeded by a "#" and followed by " (avg)".

Following lines
Following lines include pairs of m/z and abundance values, each pair representing a peak of the filtered spectrum. The first number reports the m/z value, the second one the abundance (aka intensity). Values are separated by a tab character.

Example

#Sample 1 (avg)
1360.738	5.466
1403.742	7.704
1420.763	228.586
1432.766	36.095
1442.745	10.274
1443.744	8.435
1476.694	5.527
1522.810	7.452
1524.791	26.454
1536.823	39.780
1548.814	6.207
1640.849	5.626

Example files
The following files were generated by Geena 2 from the example input data file. See input parameters above.


Alignments
These files include the alignment data generated either from filtered spectra of the same sample or from average spectra of all samples in the analysis.
They are formatted as simple text, but with the defined syntax that is presented below.
These files can be downloaded at the end of the analysis.
Their name is "<jobname>_Alignment.txt" (for overal alignement) and "<job>_<sample>_alignment.txt" (for single samples),
where <job> is the job name given by the researcher
and <sample> is the label associated to the sample by Geena 2.
An HTML version of these results is also shown at the end of the anlysis in the results page.

First line
All files begin with a line that includes the job name for the analysis, preceded by a "#" character.

Alignment data
The alignment is reported in a table. The second row of this file includes some headers.
Each following row includes data that refer to the alignment of a single peak. These numbers are separated each other by tab characters. Aligned peaks are ordered by increasing m/z value.
The first number in the row refers to the number of aligned signals for the peak.
The second number refers to the m/z value of the aligned peak.
From the third number, the m/z value of aligned peaks in average spectra are reported. Since the alignment can be defined on the basis of a limited number of aligned average spectra (as said, the firt number of the row shows how many peaks were aligned for the given peak), some m/z values may be missing. In this case, the value is not shown, but both tab characters, those that should preceed and follow the value, are included. This allows to identify the exact average spectrum for which the value is missing, i.e. the spectrum that wasn't aligned for the peak.
Similarly, the mean abundance / intensity values and the abundance values of aligned average spectra are reported in the following positions of the row. Again, missing values are not included, but two consecutive tab characters are found.

Example
NB! Header not shown for overall readability.
2    1360.738    1360.739            	1360.736	5.466	5.571		5.361
2    1403.742    1403.749            	1403.735	7.704	7.473		7.934
3    1420.763    1420.763    1420.763	1420.763	228.586	230.163	226.361	229.235
3    1432.766    1432.766    1432.767	1432.766	36.095	35.054	35.204	38.027
3    1442.745    1442.746    1442.746	1442.743	10.274	12.636	12.755	5.432
1    1443.744                1443.744	        	8.435	        	8.435
1    1476.694                1476.694	        	5.527		5.527	
3    1522.810    1522.815    1522.804	1522.810	7.452	7.201	8.078	7.076
3    1524.791    1524.793    1524.791	1524.789	26.454	27.582	26.190	25.590
3    1536.823    1536.821    1536.823	1536.825	39.780	38.315	41.497	39.528
1    1548.814                1548.814	        	6.207		6.207	
2    1640.849    1640.852    1640.846	        	5.626	5.299	5.952	

Example file
The following file was generated by Geena 2 from the example input data file. See input parameters above.


For information, get in touch with:
Paolo Romano,
IRCCS Ospedale Policlinico San Martino,
Genoa, Italy
Click here to see my email address

If you use Geena, please cite the following paper:
Romano P et al.
Geena 2, improved automated analysis of MALDI/TOF mass spectra.
BMC Bioinformatics 2016, 17(Suppl 4):61
PMID: 26961516; DOI: 10.1186/s12859-016-0911-2