EMBOSS: User Documentation

User Documentation

Tutorial

Every document on EMBOSS seems to have a section titled What is EMBOSS? This gentle tutorial originally by Val Curwen and others and now maintained by David Martin is no exception. It will lead you through using EMBOSS by showing you some of the more useful and ubiquitous programs in the package and allow you to investigate the properties and feel of the programs when run at the UNIX command-line.

The PostScript and PDF versions of the tutorial are also available for download.

Running EMBOSS Programs

EMBOSS programs are run by typing them at the UNIX prompt, or by using an interface.

Interfaces

There are many available interfaces.

Jemboss is our supported interface.

Command Line

When EMBOSS has been set up for you and you are being prompted by UNIX to type in the names of programs, you can type the name of any EMBOSS program. What you type is called the command-line.

Any required information that you have not already given on the command-line will be prompted for.

The EMBOSS command syntax follows normal UNIX command conventions, (options start with a '-', for example: "program -format 2").

If in doubt, type:
programname -help to get some help on the options
or
programname -opt to make the program prompt you for common options
or
tfm programname to get the full help on a program

Useful Themes in EMBOSS

Many EMBOSS programs have functionality in common. They all understand the same sorts of sequence formats, output formats and Feature formats. The following are descriptions of some of the common themes in EMBOSS.

Uniform Sequence Addresses

The Uniform Sequence Address, or USA, is a standard sequence naming used by all EMBOSS applications.

The USA syntax is basically one of:

"format::file"
"format::file:entry"
"dbname:entry"
"@listfile" (a file of file-names)

The "::" and ":" syntax is to allow, for example, "embl" and "pir" to be both database names and sequence formats.

In addition, EMBOSS allows the command line to separately define the format and the entry name so that only the filename is required

Sequence Formats

You can specify the format to use on input by giving the format name with two colons before the file holding your sequences. For example:

embl::myfile.seq

The format is not required. When reading in a sequence, EMBOSS will guess the sequence format by trying all known formats until one succeeds.

When writing out a sequence, EMBOSS will use fasta format by default. You can specify another format to use, for example:

gcg::myresults.seq

Contents

Tutorial

Running EMBOSS Programs

Interfaces

Command Line

Useful Themes in EMBOSS

Uniform Sequence Addresses

Sequence Formats

Alignment Formats

Feature Formats

Report Formats

Reference for EMBOSS