EMBOSS Documentation For Administrators

Administators Guide to setting up EMBOSS

You should read the EMBOSS Administrators Guide by David Martin, Peter Rice and Alan Bleasby.
(Or download ps or pdf)

In case of difficulty in downloading the file, it's also available from ftp://emboss.open-bio.org/pub/EMBOSS/doc/.

There is also a short guide in German that shows how to install Kaptain and EMBOSS.

Sequence databases

EMBOSS provides excellent database support with a variety of ways to index and access the databases. For example, EMBL entries could be read from :

Original EMBL flatfiles using the CD-ROM or Staden indices
Original EMBL flatfiles using SRS indices
GCG database format using SRS indices
A query to the EBI web server
A query to any SRS web server
(possibly in the future) a query to an EBI CORBA server

The setting up of databases is covered in detail in the Admin Guide.

Example databases

If you wish to gain experience in setting up various styles of databases under EMBOSS, you will find some small example databases included in EMBOSS in the test directory after you unpack the release.

The EMBOSS developers use them to test database indexing and sequence reading.

See directories:

test/data    (emrod (DNA) and swnew (protein) are in blast format)
test/embl    (*.dat for EMBL format, *.ref and *.seq for gcg format)
test/pir     (*.ref and *.seq for nbrf format)
test/swiss   (*.dat for swissprot format, 1 file)
test/swnew   (*.dat for swissprot format, 3 files)
test/wormpep (wormpep is in fasta and blast format)

If you use the emboss/emboss.default.template file (included in the distribution) to create your own emboss.default file, change the definition of emboss_tempdata at the top to point to your test directory and you can use the test databases as "tembl", "tsw" and so on. The databases contain the sequences that are used in the usage examples for the applications (see the web pages, or run the "tfm" program to see the documentation).

You can also reindex these files yourself to test the dbi* programs and to test writing your own DB definitions for emboss.default.

Pre-indexed databases

Don Gilbert (Indiana University) is making EMBOSS format databanks of recent GenBank DNA databank plus non-redundent EMBL, GenPept, PIR and SwissProt available on a trial basis for public use. You can fetch these data from IUBio Archive:

ftp://iubio.bio.indiana.edu/biomirror-gcg/

 Mar  6 22:36 Readme
 May 17 12:40 emboss.default.gz
 May 18 02:19 gcgdbconfigure
 May 18 02:19 gcgembl      (release 70, non-redundant w/ genbank)
 May 18 02:18 gcggenbank1  (core genbank, release 129)
 May 18 22:37 gcggenbank2  (est,gss of rel  129)
 May 17 22:13 gcggenpept   (release 129)
 May 17 20:38 gcgpir       (release 71)
 May 17 20:32 gcgswissprot (release 40)

These are gzip compressed, but otherwise should drop into an EMBOSS system with minor editing of the emboss.default file path. Included are EMBOSS package indices with each data set (total size about 60 GB uncompressed; 20 GB compressed).

This is a trial to see if those of you who support EMBOSS want such a set of data + indices. Let Don Gilbert know if you find it useful.

EMBOSS Documentation For Administrators

Contents

Administators Guide to setting up EMBOSS

Sequence databases

Example databases

Pre-indexed databases