EMBOSS: Project Meeting (Monday 16th February 2009)

EMBOSS: Project Meeting (Mon 16th February 09)

Attendees

EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag
Sanger:
Visitors:
Apologies:

1. Minutes of the last meeting

The 2nd February meeting was cancelled due to snowfall having its usual effects on the roads around Cambridge.

Minutes of the meeting of 19th January 2009 are here.

2. Software Development

2.1 Applications

Alan has updated the code of iep to remove global variables.

Mahmut proposed changing garnier to produce fewer short predictions as the current output displays poorly in a DAS client. A possible improvement is described by Garnier and Robson in Methods in Enzymology, available at http://dx.doi.org/10.1016/S0076-6879(96)66034-0

Peter has updated molecular weight internal values to calculate Swissprot SQ records. The updated values are from the Expasy website and are slightly different at the 5th decimal place. A very small difference, but enough to occasionally change a protein molecular weight. The internal calculation now is double precision. It was agreed that the data file Emolwt.dat should be updated to use the Expasy molecular weights so that all molecular weight calculations are consistent.

2.2 Libraries

Alan has fixed a reported problem with emma under Windows and built a new mEMBOSS bundle.

Peter has updated the hardcoded EMBOSS version number to "6.x" for mEMBOSS. This code is used in embossversion and in writing SOURCE headers in HTTP server requests.

Peter has implemented DAS outputs for sequences (DAS for nucleotide, DASSEQUENCE for protein and nucleotide) and features (DASGFF format) for use in EMBOSS DAS services. Features are the most useful DAS output.

DASGFF requires a sequence ontology (or BioSapiens ontology) tag for protein features. Peter has updated the Efeatures definitions for proteins to use GFF3 sequence ontology codes as internal identifiers, and to use GFF3 as the principle definitions for all protein features. All SwissProt feature types (36 in the current Swissprot release) are also defined with the closest possible match to the sequence ontology. Where there is no exact match, an EMBOSS internal type is defined using the closets SO code and the original feature type as a suffix. For SwissProt output this is converted back to the swissprot feature type. For GFF3 output the internal type is an alias for the closest (more general) SO term.

Writing XML outputs required a change to the function interface for writing features. Instead of passing the output file object, the functions now need to feature output object to use the feature statistics and to allow a new call to a cleanup function when the feature output is closed. For the first feature, DASGFF output writes the XML file header, and when closing the file the closing XML tags are written.

Sequence output in DAS and DASSEQUENCE formats uses the same count and cleanup checking.

2.3 SoapLab

Mahmut has checked in the code for SoapLab typed services. A few EBI-specific features need to be corrected. Fixes have been made for client-side concurrency. Mahmut has developed five DAS services using calls to EMBOSS applications to generate annotation using DASGFF report format. The applications have been tested using Dasty.

Mahmut has fixed a BioPerl problem reported by Martin.

2.4 Other

Mahmut suggested testing mEMBOSS under PowerShell on Windows XP. This shell is required by SQLexpress. We know of no users trying it.

Peter suggested annotating all inputs, outputs and parameters using terms from the proposed EMBRACE datatypes and algorithms ontology. Jon has generated terms from the known types and other definitions in ACD files. Mapping back should be a simple test for the consistency of the prototype ontology.

3. Administration

Alan has updated Fedora on the EMBOSS machines. The emboss.org address is expiring in a few months. Alan will try to move the registration to a more stable DNS provider.

4. Documentation and Training

4.1 Books

Alan has thoroughly revised the Administrator's guide. Some extra sections are needed from Peter.

The administrator's guide needs some information on mEMBOSS, possibly only a brief section.

Alan will work through the user's guide next.

Jon has a new technical contact for XML document handling at the publishers.

4.2 Training

The Madrid course now has a proposed date in May. We need to avoid clashes with the EMBRACE AGM and a possible meeting in Canada for Peter.

5. User queries and answers

Peter has worked through the outstanding tracker items on the bug list and cleared them.

No new items

6. AOB

None.

7. Date Of Next Meeting

The next meeting is on Monday 2nd March.