In case of difficulty in downloading the file, it's also available from ftp://emboss.open-bio.org/pub/EMBOSS/doc/.
There is also a short guide in German that shows how to install Kaptain and EMBOSS.
The setting up of databases is covered in detail in the Admin Guide.
The EMBOSS developers use them to test database indexing and sequence reading.
See directories:
test/data (emrod (DNA) and swnew (protein) are in blast format) test/embl (*.dat for EMBL format, *.ref and *.seq for gcg format) test/pir (*.ref and *.seq for nbrf format) test/swiss (*.dat for swissprot format, 1 file) test/swnew (*.dat for swissprot format, 3 files) test/wormpep (wormpep is in fasta and blast format)
If you use the emboss/emboss.default.template file (included in the distribution) to create your own emboss.default file, change the definition of emboss_tempdata at the top to point to your test directory and you can use the test databases as "tembl", "tsw" and so on. The databases contain the sequences that are used in the usage examples for the applications (see the web pages, or run the "tfm" program to see the documentation).
You can also reindex these files yourself to test the dbi* programs and to test writing your own DB definitions for emboss.default.
ftp://iubio.bio.indiana.edu/biomirror-gcg/
Mar 6 22:36 Readme May 17 12:40 emboss.default.gz May 18 02:19 gcgdbconfigure May 18 02:19 gcgembl (release 70, non-redundant w/ genbank) May 18 02:18 gcggenbank1 (core genbank, release 129) May 18 22:37 gcggenbank2 (est,gss of rel 129) May 17 22:13 gcggenpept (release 129) May 17 20:38 gcgpir (release 71) May 17 20:32 gcgswissprot (release 40)
These are gzip compressed, but otherwise should drop into an EMBOSS system with minor editing of the emboss.default file path. Included are EMBOSS package indices with each data set (total size about 60 GB uncompressed; 20 GB compressed).
This is a trial to see if those of you who support EMBOSS want such a set of data + indices. Let Don Gilbert know if you find it useful.