EMBOSS: Sequence Features |
We recently added suport for feature tables to EMBOSS.
This reads a feature table with a sequence, and can write the results to a feature table, or write a sequence with a feature table.
The first format supported is "General feature Format" or GFF which is used at the Sanger Centre and other institutes to exchange the results of gene finding and other programs.
GFF format is a tab delimited file where each line has:
For example:
seq1 BLASTX similarity 101 235 87.1 + 0 Target "HBA_HUMAN" 11 55 ; E_value 0.0003 dJ102G20 GD_mRNA coding_exon 7105 7201 . - 2 Sequence "dJ102G20.C1.1"
The "Sequence" tag is used to group a set of start/end positions, as for "join" in the EMBL feature table.
In EMBOSS features are supported by the Uniform Feature Object or UFO, which looks like a sequence USA. By default, feature reading uses a file called "seqname.gff" for input and output.
The GFF maintainers have agreed that GFF can be used for protein features by ignoring the "strand" and "frame" fields.
GFF has the advantage that the format includes the sequence ID so we can link the feature table back to the sequence and have many sequences in one GFF file or many GFF files for one sequence (for example the results of several programs to be merged).
EMBOSS reads and writes feature tables with sequences automatically. Adding "feature: Y" to the input and output of "seqret" in the ACD file creates an application that can automatically read and write any feature table format, with no changes to the 10 lines of source code.