Every document on EMBOSS seems to have a section
titled What is EMBOSS? This gentle tutorial originally by
Val Curwen and others and now maintained by David Martin is no
exception. It will lead you through using EMBOSS by showing you some of
the more useful and ubiquitous programs in the package and allow you to
investigate the properties and feel of the programs when run at the UNIX
The PostScript and PDF versions of the
tutorial are also available for download.
EMBOSS programs are run by typing them at the UNIX prompt,
or by using an interface.
There are many available interfaces.
Jemboss is our supported interface.
When EMBOSS has been set up for you and you are being prompted by UNIX
to type in the names of programs, you can type the name of any EMBOSS
program. What you type is called the command-line.
Any required information that you have not already given on the
command-line will be prompted for.
The EMBOSS command syntax follows
normal UNIX command conventions, (options start with a '-', for example:
"program -format 2").
If in doubt, type:
programname -help to get some help on the options
programname -opt to make the program prompt you for common options
tfm programname to get the
on a program
Many EMBOSS programs have functionality in common. They all understand
the same sorts of sequence formats, output formats and Feature formats.
The following are descriptions of some of the common themes in EMBOSS.
The Uniform Sequence Address, or USA, is a standard sequence naming used
by all EMBOSS applications.
The USA syntax is basically one of:
- "@listfile" (a file of file-names)
The "::" and ":" syntax is to allow, for example, "embl" and "pir" to be
both database names and sequence formats.
In addition, EMBOSS allows the command line to separately define the format
and the entry name so that only the filename is required
You can specify the format to use on input by giving the format name
with two colons before the file holding your sequences. For example:
The format is not required. When reading in a sequence, EMBOSS will
guess the sequence
format by trying all known formats until one succeeds.
When writing out a sequence, EMBOSS will use fasta format by
default. You can specify another format to use, for example:
When writing out an alignment between two or more sequences, EMBOSS now
has a standard set of formats that are used.
When reading or writing features associated with a sequence, there are a
standard set of formats that are used.
The feature files can either be a standard sequence format with a
feature table as part of the sequence format, or the features can be
held in a file without the associated sequence.
There are many ways in which the results of an analysis can be reported.
Many EMBOSS programs are now able to output their results in a standard
report format - you can change the report format used by putting
'-rformat name' on the command-line, where 'name' is the name of one of
the standard report formats.
Rice,P. Longden,I. and Bleasby,A.
"EMBOSS: The European Molecular Biology Open Software Suite"
Trends in Genetics June 2000, vol 16, No 6. pp.276-277