EMBOSS: Project Meeting (Mon 16th February 09)
Minutes of the meeting of 19th January 2009 are here.
Mahmut proposed changing garnier to produce fewer short predictions as the current output displays poorly in a DAS client. A possible improvement is described by Garnier and Robson in Methods in Enzymology, available at http://dx.doi.org/10.1016/S0076-6879(96)66034-0
Peter has updated molecular weight internal values to calculate Swissprot SQ records. The updated values are from the Expasy website and are slightly different at the 5th decimal place. A very small difference, but enough to occasionally change a protein molecular weight. The internal calculation now is double precision. It was agreed that the data file Emolwt.dat should be updated to use the Expasy molecular weights so that all molecular weight calculations are consistent.
Peter has updated the hardcoded EMBOSS version number to "6.x" for mEMBOSS. This code is used in embossversion and in writing SOURCE headers in HTTP server requests.
Peter has implemented DAS outputs for sequences (DAS for nucleotide, DASSEQUENCE for protein and nucleotide) and features (DASGFF format) for use in EMBOSS DAS services. Features are the most useful DAS output.
DASGFF requires a sequence ontology (or BioSapiens ontology) tag for protein features. Peter has updated the Efeatures definitions for proteins to use GFF3 sequence ontology codes as internal identifiers, and to use GFF3 as the principle definitions for all protein features. All SwissProt feature types (36 in the current Swissprot release) are also defined with the closest possible match to the sequence ontology. Where there is no exact match, an EMBOSS internal type is defined using the closets SO code and the original feature type as a suffix. For SwissProt output this is converted back to the swissprot feature type. For GFF3 output the internal type is an alias for the closest (more general) SO term.
Writing XML outputs required a change to the function interface for writing features. Instead of passing the output file object, the functions now need to feature output object to use the feature statistics and to allow a new call to a cleanup function when the feature output is closed. For the first feature, DASGFF output writes the XML file header, and when closing the file the closing XML tags are written.
Sequence output in DAS and DASSEQUENCE formats uses the same count and cleanup checking.
Mahmut has fixed a BioPerl problem reported by Martin.
Peter suggested annotating all inputs, outputs and parameters using terms from the proposed EMBRACE datatypes and algorithms ontology. Jon has generated terms from the known types and other definitions in ACD files. Mapping back should be a simple test for the consistency of the prototype ontology.
The administrator's guide needs some information on mEMBOSS, possibly only a brief section.
Alan will work through the user's guide next.
Jon has a new technical contact for XML document handling at the publishers.
No new items