next up previous contents
Next: Indexing and configuring FASTA Up: Databases Previous: Indexing and configuring GCG   Contents

Indexing and configuring BLAST databases

BLAST format databases are generated for efficient homology searching using the BLAST programs. It can be convenient to avoid redundant copies of databases so EMBOSS provides a mechanism for accessing these databases.

BLAST format databases are those generated using the tools distributed with NCBI-BLAST or with WU-BLAST.

For indexing of one BLAST database, move to the directory containing your BLAST format databases and run dbiblast

Index a BLAST database
Database name: blastsw
Database directory [.]: 
database base filename [blastsw]: 
Release number [0.0]: 
Index date [00/00/00]: 
         N : nucleic
         P : protein
         ? : unknown
Sequence type [unknown]: p
         1 : wublast and setdb/pressdb
         2 : formatdb
         0 : unknown
Blast index version [unknown]: 2

The program will chug along for a while and will then generate the EMBLCD index files for the BLAST format database.

The following entry (or one like it that is more appropriate to your particular installation) should be put in your

.embossrc

DB blastsw [
   type: P
   method: blast
   format: ncbi
   dir: \$emboss_db_dir/blastsw
   file: "blastsw"
   release: "38.9"
   comment: "BLAST format Swissprot"
]

showdb should show your newly configured database.

Because of the way BLAST works, many sites may group their BLAST databases in the same directory. You can index these in situ with dbiblast but this may require some extra steps if your databases are not of the same type as generation of subsequent index files will overwrite those that already exist. To avoid overwriting of index files you can index many databases with one set of index files, or you can use the

indexdir
options to place the indices in a different directory.

There are two requirements for indexing several databases together in one index. The first is that the databases are the same type (protein/nucleic acid) and generated with the same tool (pressdb or formatdb); the second is that all the ID and accession numbers in the combined databases are unique.

Run dbiblast as before but specify all the databases you wish to be included when prompted for the database filename.

Index a BLAST database
Database name: alldbs
Database directory [.]: 
database base filename [alldbs]: dbone dbtwo dbthree dbfour 
Release number [0.0]: 
Index date [00/00/00]: 
         N : nucleic
         P : protein
         ? : unknown
Sequence type [unknown]: p
         1 : wublast and setdb/pressdb
         2 : formatdb
         0 : unknown
Blast index version [unknown]: 2

These can then be configured as described in section 3.2.5 above by using the '

file:
' and '
exclude:
' tags as appropriate.3.5

When you have databases of different types, generated with different programs or where the ID/accession numbers are duplicated between databases the preferred strategy is probably to keep the source data for the individual databases in separate directories and index them there.3.6

Alternatively you can place the index files in a separate directory. This requires that you run dbiblast with the

-indexdirectory
option and set the
indexdir:
tag in the database configuration to point to the correct database. The example below illustrates database configuration using the
indexdir
options.

% dbiblast -indexdir=/databases/indices/mydb
Index a BLAST database
Database name: mydb
Database directory [.]: 
database base filename [mydb]: 
Release number [0.0]: 
Index date [00/00/00]: 
         N : nucleic
         P : protein
         ? : unknown
Sequence type [unknown]: p
         1 : wublast and setdb/pressdb
         2 : formatdb
         0 : unknown
Blast index version [unknown]: 2

The corresponding entry in

 /.embossrc
(or
emboss.default
) would look like:

DB mydb [
   type: P
   method: blast
   format: ncbi
   dir: \$emboss_db_dir/blastsw
   indexdir: /databases/indices/mydb
   file: mydb
   release: "1.0"
   comment: "My BLAST DB with an index in a different directory"
]

Again, multiple indices cannot coexist in the same directory so care should be taken when using the

indexdir
options that an existing database index is not overwritten.


next up previous contents
Next: Indexing and configuring FASTA Up: Databases Previous: Indexing and configuring GCG   Contents
Peter Rice 2007-04-26