next up previous contents
Next: Fine tuning the installation: Up: Databases Previous: Mixed access methods   Contents

Indexing and configuring flatfile databases

Flatfile databases are plain text files in a defined format such as those released by EMBL, Swissprot and so on. The EMBOSS program dbiflat is used to generate EMBLCD indices that can be used for all types of database access. dbiflat can process databases in EMBL, SWISSPROT and GENBANK format. Pseudo EMBL format databases which do not have unique ID and AC entries may cause dbiflat to do mysterious things and should be avoided.

dbiflat (and the EMBLCD access method) requires the databases to be uncompressed. The examples given here will not probe the deeper secrets of dbiflat (for which the reader is referred to the documentation, or failing that the source code) but will show a typical installation for a common database.

We assume that EMBOSS has been installed and works. This can be tested with the command

wossname -auto
which should list all the programs available.

In this example we will index and configure the EMBL database for use with EMBOSS.

First download and unpack the EMBL database. This will require a considerable amount of disk space. If you do not have sufficient space available then just download a subset of the database.

Use

cd
to move the directory in which you have unpacked EMBL. This should look something like this when you run
ls
:

% ls
est_fun.dat
est_hum1.dat
est_hum10.dat
.
Output truncated
.
syn.dat
unc.dat
vrl.dat
vrt.dat

Run dbiflat to create the EMBLCD indices.

% dbiflat

Index a flat file database
      EMBL : EMBL
     SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
        GB : Genbank, DDBJ
Entry format [SWISS]: EMBL   
Database name: embl
Database directory [.]: 
Wildcard database filename [*.dat]: 
Release number [0.0]: 63.0
Index date [00/00/00]: 31/07/00

dbiflat should happily chug away for some considerable time (up to a few hours depending on the speed of your machine) and will generate (eventually) the following index files:

% ls
acnum.hit
acnum.trg
division.lkp
entrynam.idx

Now we create an entry in the EMBOSS configuration files to acces sthe database. It is probably a good idea to try new database definitions in your local configuration file first.

Put the following entry in your

.embossrc

DB embl [
   type: N
   method: emblcd
   format: embl
   dir: \$emboss_db_dir/embl
   file: "*.dat"
   release: "63.0"
   comment: "EMBL release 63.0"
]

you will have needed to predefine

$emboss_db_dir
using a directive such as

set emboss_db_dir /path_to_databases

somewhere in your

emboss.default
or
.embossrc
.

Save

.embossrc
and try showdb. You should see a line that looks like:

% showdb
.. output deleted
embl          N    OK  OK  OK  EMBL release 63.0
.. output deleted


next up previous contents
Next: Fine tuning the installation: Up: Databases Previous: Mixed access methods   Contents
Peter Rice 2007-04-26