next up previous contents
Next: Indexing and configuring BLAST Up: Databases Previous: Fine tuning the installation:   Contents

Indexing and configuring GCG format databases

EMBOSS can access GCG formatted databases, thus avoiding having multiple copies of the same databases in different formats for those who still use GCG alongside the flatfiles. EMBOSS creates EMBLCD like indices for the GCG format databases using the program dbigcg. This runs in much the same way as dbiflat. You will need the GCG format

.seq
and
.header
files in order to create an EMBLCD indexed database.

Move to the GCG database directory containing your data and run dbigcg

Index a GCG formatted database
      EMBL : EMBL
     SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
        GB : Genbank, DDBJ
       PIR : NBRF
Entry format [EMBL]: 
Database name: embl
Database directory [.]: 
Wildcard database filename [*.seq]: 
Release number [0.0]: 63.0
Index date [00/00/00]: 31/07/00

The program will chug along for a while and will then generate the EMBLCD index files for the GCG format database.

When dbigcg prompts for the entry format (

Entry
format [EMBL]:
) you should enter the original database format before you ran embltogcg or similar to generate the GCG databases.

The following entry should be put in your

.embossrc

DB gcgembl [
   type: N
   method: gcg
   format: embl
   dir: \$emboss_db_dir/embl
   file: "*.dat"
   release: "63.0"
   comment: "EMBL release 63.0"
]

showdb should show your newly configured database.

You can configure subsets of the databases in the same way as for the original format databases, described in section 3.2.5 above. One difference to dbiflat indexing is that both the

.seq
and
.header
files are listed in the
division.lkp
file.
file:
and
exclude:
directives should therefore be of the form
exclude:
*/em_est*
instead of just
*/em_est*.seq
.


next up previous contents
Next: Indexing and configuring BLAST Up: Databases Previous: Fine tuning the installation:   Contents
Peter Rice 2007-04-26