There are many different programs in EMBOSS that create many different types of reports. Some of these programs have been incorporated into EMBOSS from pre-existing programs and some were specially written for it.
The resulting assortment of programs were starting to produce report output in a variety of different formats. It was decided that from EMBOSS version 2.2 onwards there should be a set of standard report formats.
If you only intend to look at the resulting reports and not read them into any other programs, then it is still worth having a standard set of formats as you will very quickly get used to the look and feel of a format and be able to compare the reports from different programs more easily.
It is often convenient to have different report formats produced by the same program for different purposes. Depending on what you may wish to do with the result, it may be better to have a human readable report for publication purposes, or a less-readable report for input into another program for further analysis.
Different programs will have different default report formats. You may accept the default or choose your preferred format when you run the program.
For example:
######################################## # Program: garnier # Rundate: Mon Feb 11 15:14:40 2002 # Report_file: report.dbmotif ######################################## #======================================= # # Sequence: 100K_RAT from: 1 to: 889 # HitCount: 206 # # DCH = 0, DCS = 0 # # Please cite: # Garnier, Osguthorpe and Robson (1978) J. Mol. Biol. 120:97-120 # # # #=======================================
There is also a block of information at the end of the report for summary information.
For example:
#--------------------------------------- # # Residue totals: H:364 E:149 T:191 C:185 # percent: H: 41.7 E: 17.1 T: 21.9 C: 21.2 # # #---------------------------------------
Some of the report formats can cope with an unlimited number of sequences, while others are only for single sequences or pairs of sequences.
Name | Comments |
---|---|
embl | Writes a report in EMBL feature table format |
genbank | Writes a report in Genbank feature table format |
gff | Writes a report in GFF feature table format |
pir | Writes a report in PIR feature table format |
swiss | Writes a report in SwissProt feature table format |
debug | This is of use only for debugging. |
listfile | This writes out a list file with the start and end points of the motifs given by '[start:end]' after the sequence's full USA. This is useful as it is a true List File that can be read in by other EMBOSS programs using '@' or 'list::' before the filename. |
dbmotif |
Writes a report in DbMotif format
Format: Length = [length] Start = position [start] of sequence End = position [end] of sequence ... other tags ... [sequence] [start and end numbered below sequence with '|' marks] Blank line Data reported: Length, Start, End, Sequence (5 bases around feature) |
diffseq |
This format is most useful when reporting the results of two sequences
aligned, as in the program diffseq.
The report describes matches, usually short, between two sequences and features which overlap them.
Format: |
excel |
This is a TAB-delimited table format suitable for reading into
spread-sheet programs such as Excel.
Name, start, end and score are always reported. Other tags in the report definition are added as extra columns. All values are (for now) unquoted. Missing values are reported as '.' |
feattable |
Writes a report in FeatTable format. The report is an EMBL feature
table using only the tags in the report definition. There is no
requirement for tag names to match standards for the EMBL feature
table.
The original EMBOSS application for this format was cpgreport.
Format: |
motif |
Writes a report in Motif format. Based on the original output
format of antigenic, helixturnhelix and sigcleave.
Format: (1) Score [score] length [length] at [name] [start->[end] * (marked at position pos) [sequence] | | [start] [end] [tagname]: tagvalue Data reported: Name, Start, End, Length, Score, Sequence |
regions |
Writes a report in Regions format. The report (unusually for the current
report formats) includes the feature type.
Format: [type] from [start] to [end] ([length] [name]) ([tagname]: [tagvalue], [tagname]: [tagvalue] ...) Data reported: Type, Start, End, Length, Name |
seqtable |
Writes a report in SeqTable format This is a simple table format that
includes the feature sequence. See Table for a version
without the sequence. Missing tag values are reported as '.'
The column width is 6, or longer if the name is longer.
Format: |
simple |
Writes a report in SRS simple format This is a simple parsable format that
does not include the feature sequence (see also SRS format)
for applications where features can be large.
Missing tag values are reported as '.'
Format: Feature [number] Name: [ID name] Start: [start] End: [end] Length: [length] [tagnames:] [tag values] Blank line |
srs |
Writes a report in SRS format This is a simple parsable format that
includes the feature sequence.
Missing tag values are reported as '.'
Format: Feature [number] Name: [ID name] Start: [start] End: [end] Length: [length] Sequence: [sequence] Score: [score] [tagnames:] [tag values] Blank line |
table |
Writes a report in Table format. See seqtable for a version
with the sequence. Missing tag values are reported as '.'
The column width is 6, or longer if the name is longer.
Format: |
tagseq |
Writes a report in Tagseq format. Features are marked up below the sequence.
Originally developed for the garnier application, but has general uses.
Format: Sequence position written every 10 bases/residues Sequence (50 residues) tagname ++++++++++++ +++++++++ Blank line If the tag value is a 1 letter code, use this instead of '+' |
You are not restricted to these default formats. You can use any format.
You specify the required format by putting the qualifier -rformat followed by the format name on the command line, for example:
garnier -rformat gff
-rformat string report format -ropenfile string report file name -rextension string file name extension -rname string base file name -raccshow bool show accession number in the report -rdesshow bool show description in the report -rusashow bool show the full USA in the report -rdirectory bool report file output directory
Of these, -rformat and -rusashow might be the more useful.