User Documentation

 

Contents

Tutorial

Every document on EMBOSS seems to have a section titled What is EMBOSS? This gentle tutorial originally by Val Curwen and others and now maintained by David Martin is no exception. It will lead you through using EMBOSS by showing you some of the more useful and ubiquitous programs in the package and allow you to investigate the properties and feel of the programs when run at the UNIX command-line.

The PostScript and PDF versions of the tutorial are also available for download.

Running EMBOSS Programs

EMBOSS programs are run by typing them at the UNIX prompt, or by using an interface.

Interfaces

There are many available interfaces.

Jemboss is our supported interface.

Command Line

When EMBOSS has been set up for you and you are being prompted by UNIX to type in the names of programs, you can type the name of any EMBOSS program. What you type is called the command-line.

Any required information that you have not already given on the command-line will be prompted for.

The EMBOSS command syntax follows normal UNIX command conventions, (options start with a '-', for example: "program -format 2").

If in doubt, type:
programname -help to get some help on the options
or
programname -opt to make the program prompt you for common options
or
tfm programname to get the full help on a program


Useful Themes in EMBOSS

Many EMBOSS programs have functionality in common. They all understand the same sorts of sequence formats, output formats and Feature formats. The following are descriptions of some of the common themes in EMBOSS.

Uniform Sequence Addresses

The Uniform Sequence Address, or USA, is a standard sequence naming used by all EMBOSS applications.

The USA syntax is basically one of:

The "::" and ":" syntax is to allow, for example, "embl" and "pir" to be both database names and sequence formats.

In addition, EMBOSS allows the command line to separately define the format and the entry name so that only the filename is required

Sequence Formats

You can specify the format to use on input by giving the format name with two colons before the file holding your sequences. For example:

embl::myfile.seq

The format is not required. When reading in a sequence, EMBOSS will guess the sequence format by trying all known formats until one succeeds.

When writing out a sequence, EMBOSS will use fasta format by default. You can specify another format to use, for example:

gcg::myresults.seq

Alignment Formats

When writing out an alignment between two or more sequences, EMBOSS now has a standard set of formats that are used.

Feature Formats

When reading or writing features associated with a sequence, there are a standard set of formats that are used.

The feature files can either be a standard sequence format with a feature table as part of the sequence format, or the features can be held in a file without the associated sequence.

Report Formats

There are many ways in which the results of an analysis can be reported.

Many EMBOSS programs are now able to output their results in a standard report format - you can change the report format used by putting '-rformat name' on the command-line, where 'name' is the name of one of the standard report formats.


Reference for EMBOSS

Rice,P. Longden,I. and Bleasby,A.
"EMBOSS: The European Molecular Biology Open Software Suite"
Trends in Genetics June 2000, vol 16, No 6. pp.276-277