Flatfile databases are plain text files in a defined format such as those released by EMBL, Swissprot and so on. The EMBOSS program dbiflat is used to generate EMBLCD indices that can be used for all types of database access. dbiflat can process databases in EMBL, SWISSPROT and GENBANK format. Pseudo EMBL format databases which do not have unique ID and AC entries may cause dbiflat to do mysterious things and should be avoided.
dbiflat (and the EMBLCD access method) requires the databases to be uncompressed. The examples given here will not probe the deeper secrets of dbiflat (for which the reader is referred to the documentation, or failing that the source code) but will show a typical installation for a common database.
We assume that EMBOSS has been installed and works. This can be tested with the command
wossname -autowhich should list all the programs available.
In this example we will index and configure the EMBL database for use with EMBOSS.
First download and unpack the EMBL database. This will require a considerable amount of disk space. If you do not have sufficient space available then just download a subset of the database.
cdto move the directory in which you have unpacked EMBL. This should look something like this when you run
% ls est_fun.dat est_hum1.dat est_hum10.dat . Output truncated . syn.dat unc.dat vrl.dat vrt.dat
Run dbiflat to create the EMBLCD indices.
% dbiflat Index a flat file database EMBL : EMBL SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew GB : Genbank, DDBJ Entry format [SWISS]: EMBL Database name: embl Database directory [.]: Wildcard database filename [*.dat]: Release number [0.0]: 63.0 Index date [00/00/00]: 31/07/00
dbiflat should happily chug away for some considerable time (up to a few hours depending on the speed of your machine) and will generate (eventually) the following index files:
% ls acnum.hit acnum.trg division.lkp entrynam.idx
Now we create an entry in the EMBOSS configuration files to acces sthe database. It is probably a good idea to try new database definitions in your local configuration file first.
Put the following entry in your
DB embl [ type: N method: emblcd format: embl dir: \$emboss_db_dir/embl file: "*.dat" release: "63.0" comment: "EMBL release 63.0" ]
you will have needed to predefine
$emboss_db_dirusing a directive such as
set emboss_db_dir /path_to_databases
somewhere in your
.embossrcand try showdb. You should see a line that looks like:
% showdb .. output deleted embl N OK OK OK EMBL release 63.0 .. output deleted