Seblastian output help

This page provides a description of the output page of the SECISearch3/Seblastian web server. If you want to know more about these methods, visit this page instead.

Page structure

The structure of the output page is slightly different whether SECISearch3 or Seblastian are run, but it consists always of the same main elements: a summary box (black bordered) and a results box below (red bordered). The summary box contains some numbers relative to your search (how many SECIS/selenoproteins were predicted) and also links to output files.
The web server maintain each output page only for a limited period of time (one week), and after that all traces of your search will be deleted. These output files are meant to be downloaded if you want to save in your computer the results of your search. Output files includes popular formats such as GFF and fasta which can be used for further processing with other programs.
Below the summary box, the results box contains entries for each prediction in your sequence. When Seblastian is run, each entry will consist of two sections, one for the selenoprotein prediction and one for its SECIS. When only SECISearch3 is run, only the SECIS section is present.

SECIS element prediction

SECIS id: this is displayed in the title. Every SECIS is assigned a numeric incremental id.
Predicted by: this shows you which search method(s) predicted this SECIS element. This is useful when multiple methods were selected. If the SECIS was predicted by Infernal or by Covels, their scores are also provided within parenthesis. Most vertebrate SECIS elements have a Infernal score > 20 and a Covels score > 15, while in other species we observed also lower scores. The default threshold values that we provide in the input page allow to find most eukaryotic SECIS elements with a reasonable number of false positives (see paper).
Covels score: this is computed for all SECIS in output independently of which prediction method is run. This allows a quantitative measure of how much the prediction fits our SECIS model.
Free Energy of structure: RNAfold is run to calculate the approximate free energy of the SECIS structure in kcal/mol. RNAfold is provided a constrained secondary structure to follow the structural prediction given by the Infernal SECIS model. Nonetheless, RNAfold prediction may differ from Infernal's, so the free energy displayed may not be precisely the one computed on the secondary structure displayed (which derives from Infernal only).
Target name: the fasta header of the sequence where the prediction resides. If no fasta header was present in input, it will be reported as "your_sequence".
Positions on target: these are the positions of the SECIS along this target sequence. The first position is indicated by 1.
Strand: the strand on which this SECIS was found: + means it was found on the sequence as it was input, - means it was found on its reverse complement.
Grade: this is a marker for how good the SECIS prediction looks like. The method to grade SECIS elements incoporates our experience in manual analysis of hundreds SECIS elements. It checks several structural features of stem2 (such as the presence of consecutive mismatches, the bending of the structure, the match or mismatch state of critical positions) as well as the presence of typical unpaired nucleotides in the apical loop, as well as the Covels score for the prediction. The SECIS grade can be A, B, C in decreasing order of goodness.
SEQ: the nucleotide sequence of the SECIS, in RNA letters. The SECIS core is marked in bold (non-italics). If the typical conserved unpaired nucleotides on the top were identified (typically AA, but sometimes also CC), they are marked in bold italics.
SS: the secondary structure for the SECIS. This is predicted always by Infernal, and may differ from the minimum free energy structure of this sequence. In particular, Infernal may force pairing in the stems and will not predict any pairing in the apical portion. Unpaired nucleotides are shown as dots("."). Standard pairs (i.e. UA, GC, GU) are indicated with rounded brackets "()". GA pairs (present in the kink-turn core) are indicated with square brackets "[]". The other pairs, much more unprobable energetically, are indicated with oval brackets "{}".
Image: this is generated with RNAplot and shows the secondary structure as indicated in the "SS:" line. The nucleotides forming the SECIS core are marked in green. If present, the conserved unpaired nucleotides in the apical loop are also marked in green and circled. Also, paired nucleotides are colored according to the type of pair: standar pairs are colored in cyan, GA pairs are colored in blue, and unprobable pairs are colored in red. You can download any image by right-clicking it (then select "Save image as" or similar), or download a zip archive of all images in the upper section. Note that the SECIS image is not present when you uncheck the option "generate SECIS images" in the input page.

Selenoprotein prediction

Selenoprotein id: this is the unique identifier of this prediction and copies to the id of its predicted SECIS element. There may be SECIS elements with no selenoprotein predicted upstream, so some selenoprotein ids may be missing from output (e.g. you may have only ids 1 and 3 in output if no selenoprotein was found for secis #2).
Category: the category depends on the residue aligned to the predicted Sec-UGA. It can be: "known selenoprotein", when the annotated query contains selenocysteine, or "new selenoprotein" when the query contains cysteine in that position. New selenoproteins are predicted only if the corresponding option is selected in the input page.
Predicted by: Seblastian uses two program for protein prediction in nucleotide sequences. The first one is blastx, whose blast hits are carefully filtered to allow only selenoprotein hits. Its output is also processed to join exons belonging to the same gene. The second one is exonerate, which is seeded with the genomic coordinates of a blastx prediction. Generally the exonerate prediction is the one that will be output, but there are conditions in which it is discarded and the blast prediction is output instead.
Blastx evalue: this is the evalue of the highest scoring blastx HSP for this prediction (multiple blast hits can be joined as exons in a prediction). It provides an rough estimation of the similarity of query and predicted target.
Query protein: this is the title of the protein used as query, as annotated in the database nr.
Positions on query: these are the position boundaries of the aligned portion of the query.
Query length: this is the full length of the annotated query protein.
Target name: the fasta header of the sequence where the prediction resides. If no fasta header was present in input, it will be reported as "your_sequence".
Positions on target: these are the positions of the prediction along the target sequence. Multiple exons are separated by commas.
Strand: the strand on which this SECIS was found: + means it was found on the sequence as it was input, - means it was found on its reverse complement.
Alignment: this is the alignment which represents the selenoprotein prediction. Target is the sequence input by the user. The codons for each aminoacid is reported vertically. The line between the two protein sequences shows the similarities between query and target: identical residues are marked with "|", similar residues (with a positive score in blosum62) are marked with "/". The selenocysteine position is in bold and marked below with an asterisk "*".

If you did not find what you were looking for, don't hesitate in contacting us (see link at the bottom of input page).

Output page help