EMBOSS: Project Meeting (Aug 14th 2000) |
The EMBOSS implementation of reading and indexing NCBI format sequences needed addressing. The current method uses the last two barred fields as accession number and ID if appropriate. If inappropriate it examines the rest of the line to try and work out what format the database was originally in. Alan reported that this was prone to error and meant the parsing of several NCBI databases wasn't covered by this approach. On Val's suggestion Alan will contact NCBI to get clarification of the format.
A dbifasta program has been written. FASTA indexing has therefore been removed from the dbiflat program. dbifasta covers more variations on the FASTA format but there is still the question mark over what to do with NCBI format. Gary has added an embWordMatchMin function to the nucleus library.
A new application, dotpath, displays a set of non-overlapping areas thereby allowing detection of SNPs.
A script frpm Gary, edithtml.pl, has been written to help produce documentation in the standard format.
It was noted that David Martin has produced a draft EMBOSS administrators guide. This is a valuable document but will need revision for NCBI format changes.
Next meeting to be held on Monday 21st August, HGMP