EMBOSS Documentation For Users


EMBOSS Frequently Asked Questions

Frequently Asked Questions (FAQ) : also distributed with EMBOSS as file "FAQ" in the top level directory. You might be wondering why it's called EMBOSS .

EMBOSS Tutorial

This gentle tutorial originally written by Val Curwen and others and now maintained by David Martin will lead you through using EMBOSS by showing you some of the more useful and ubiquitous programs in the package and allow you to investigate the properties and feel of the programs when run at the UNIX command-line.

Introductory Bioinformatics Course

This introductory course written by Lisa Mullan (EBI) takes a hands-on approach to bioinformatics utility software for computing novices. It provides an overview of available software, discuss some of the ideas behind the approaches and tackle some common tasks. EMBOSS is used quite extensively in the course which is why we provide it here. The course has many practical examples and these try to follow typical mini projects so that the relevance is apparent.

Running EMBOSS applications

Graphical User Interfaces

There are many available interfaces.

Jemboss is our supported interface.

Command line

Once EMBOSS is installed and configured, type the name of an application at a Unix prompt (the command line) to run that application. Any required values that you have not already given on the command-line will be prompted for automatically.

The EMBOSS command syntax follows normal UNIX command conventions. For example, options are specified with a '-' e.g. "program -format 2".

If in doubt, type:
programname -help to get some help on the options
programname -opt to force the application to prompt you for values for all available options, including ones which are not normally prompted for and for which default values are used.
tfm programname to get the full help on a program

Useful themes in EMBOSS

Many EMBOSS applications have functionality in common. +For instance, the applications support the same sorts of sequence formats, output formats and Feature formats. The following are descriptions of some of the common themes in EMBOSS.

The Uniform Sequence Address , or USA, is a standard sequence naming scheme used by all EMBOSS applications.

The USA syntax has the following types:

Where "format" is the database format of a file ("file") you have provided and "entry" is the database entry code. Alternatively an entry can be retrieved from an installed database of format "dnmame". "listfile" is the name a file which itself contains a list of file names.

The "::" and ":" syntax is to allow, for example, "embl" and "pir" to be both database names and sequence formats.

Sequence formats

Many different sequence formats are supported. You can specify the format of your input file as follows:


The format is not required, however. When reading in a sequence, EMBOSS will guess the sequence format by trying all known formats until one succeeds.

When writing out a sequence, EMBOSS will use fasta format by default. You can specify another format to use, for example:


Alignment formats

When writing out an alignment between two or more sequences EMBOSS uses a standard set of alignment formats.

Feature formats

When reading or writing features associated with a sequence a standard set of feature formats are used.

A feature file can be generated in a standard sequence format including feature table, or features output in a file without the associated sequence.

Report formats

Many EMBOSS programs can output their results in a standard report format - you can change the report format used by putting '-rformat name' on the command-line, where 'name' is the name of one of the standard report formats.

How to cite EMBOSS

EMBOSS: The European Molecular Biology Open Software Suite (2000)
Rice,P. Longden,I. and Bleasby,A.

Trends in Genetics 16, (6) pp276--277