EMBOSS Documentation For Users
Contents
EMBOSS Frequently Asked Questions
Frequently Asked Questions (FAQ) : also distributed with EMBOSS as file "FAQ" in the top level directory.
You might be wondering
why it's called EMBOSS .
EMBOSS Tutorial
This
gentle tutorial originally written by Val Curwen and others and now maintained by David Martin will lead you through using EMBOSS by showing you some of the more useful and ubiquitous programs in the package and allow you to investigate the properties and feel of the programs when run at the UNIX command-line.
Introductory Bioinformatics Course
This
introductory course written by Lisa Mullan (EBI) takes a hands-on approach to bioinformatics utility software for computing novices. It provides an overview of available software, discuss some of the ideas behind the approaches and tackle some common tasks. EMBOSS is used quite extensively in the course which is why we provide it here. The course has many practical examples and these try to follow typical mini projects so that the relevance is apparent.
Running EMBOSS applications
Graphical User Interfaces
There are many available interfaces.
Jemboss is our supported interface.
Once EMBOSS is installed and configured, type the name of an application at a Unix prompt (the command line) to run that application. Any required values that you have not already given on the command-line will be prompted for automatically.
The EMBOSS
command syntax follows normal UNIX command conventions. For example,
options are specified with a '-' e.g. "program
-format 2".
If in doubt, type:
programname -help to get some help on the options
or
programname -opt
to force the application to prompt you for values for all available options, including ones which are not normally prompted for and for which default values are used.
or
tfm programname to get the full help on a program
Useful themes in EMBOSS
Many EMBOSS applications have functionality in common.
+For instance, the applications support the same sorts of sequence formats, output
formats and Feature formats. The following are descriptions of
some of the common themes in EMBOSS.
The
Uniform Sequence Address , or USA, is a standard sequence naming scheme used by all EMBOSS applications.
The USA syntax has the following types:
- "file"
- "format::file"
- "format::file:entry"
- "dbname:entry"
- "@listfile"
Where "format" is the database format of a file ("file") you have provided and "entry" is the database entry code. Alternatively an entry can be retrieved from an installed database of format "dnmame". "listfile" is the name a file which itself contains a list of file names.
The "::" and ":" syntax is to allow, for example, "embl" and "pir" to be both database names and sequence formats.
Many different
sequence formats are supported.
You can specify the format of your input file as follows:
embl::myfile.seq
The format is not required, however. When reading in a sequence, EMBOSS
will guess the
sequence format by trying all known formats until one
succeeds.
When writing out a sequence, EMBOSS will use fasta
format by default. You can specify another format to use, for
example:
gcg::myresults.seq
When writing out an alignment between two or more sequences EMBOSS uses a standard set of
alignment formats.
When reading or writing features associated with a sequence a standard set of
feature formats are used.
A feature file can be generated in a standard sequence format including feature table, or features output in a file without the associated sequence.
Many EMBOSS programs can output their results in a standard report format - you can change the report format used by putting '-rformat name' on the command-line, where 'name' is the name of one of the standard report formats.
How to cite EMBOSS
EMBOSS: The European Molecular Biology Open Software
Suite (2000)
Rice,P. Longden,I. and Bleasby,A.
Trends in Genetics 16, (6)
pp276--277