EMBOSS can access GCG formatted databases, thus avoiding having multiple copies of the same databases in different formats for those who still use GCG alongside the flatfiles. EMBOSS creates EMBLCD like indices for the GCG format databases using the program dbigcg. This runs in much the same way as dbiflat. You will need the GCG format
.seqand
.headerfiles in order to create an EMBLCD indexed database.
Move to the GCG database directory containing your data and run dbigcg
Index a GCG formatted database EMBL : EMBL SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew GB : Genbank, DDBJ PIR : NBRF Entry format [EMBL]: Database name: embl Database directory [.]: Wildcard database filename [*.seq]: Release number [0.0]: 63.0 Index date [00/00/00]: 31/07/00
The program will chug along for a while and will then generate the EMBLCD index files for the GCG format database.
When dbigcg prompts for the entry format (
Entry format [EMBL]:) you should enter the original database format before you ran embltogcg or similar to generate the GCG databases.
The following entry should be put in your
.embossrc
DB gcgembl [ type: N method: gcg format: embl dir: \$emboss_db_dir/embl file: "*.dat" release: "63.0" comment: "EMBL release 63.0" ]
showdb should show your newly configured database.
You can configure subsets of the databases in the same way as for the original format databases, described in section 3.2.5 above. One difference to dbiflat indexing is that both the
.seqand
.headerfiles are listed in the
division.lkpfile.
file:and
exclude:directives should therefore be of the form
exclude: */em_est*instead of just
*/em_est*.seq.