EMBOSS: Project Meeting (Mon 5th January 09)


Attendees

EBI: Peter Rice, Alan Bleasby, Mahmut Uludag
Sanger:
Visitors:
Apologies: Jon Ison

1. Minutes of the last meeting

Minutes of the meeting of 8th December 2008 are here.

2. Software Development

2.1 Applications

Peter reported an interesting set of requests from a user of revseq. The user asked why the case of the original GenBank sequence was not preserved. This is because when EMBL and GenBank changed case some years ago we put in a conversion to upper case in the GenBank and EMBL parsers. It was agreed that this should be removed for the next release. The user also asked whether the output filename could be derived from the input filename rather than the input sequence ID. Peter proposed adding an alternative output file name default in ACD processing, with a command line option and an environment variable to toggle between the standard and alternative behaviours. Some interfaces could benefit from the ability to define the output filename prefix. Thirdly, the user asked whether revseq could update the description of the output sequence so that it is clear that it is a reverse-complement of the original. it was agreed that this could be done (and noted that the added tag needs to be removed if the sequence is reverse complemented again).

Peter is also working on improvements to the display of translated sequences by showseq and sixpack suggested by a user. Sequence ranges (exons) now display in the correct reading frame (the current behaviour is to force them into frame 1). Presentation of three letter amino acid codes in the reverse direction was fixed. Some tests remain before committing the new code.

2.2 Libraries

Peter has been working through outstanding bug reports in the trackers on SourceForge.

For phylogenetic applications (PHYLIPNEW) reading distance matrix files failed for some formats written by other applications. Distance matrix input now works for multiple matrices in square, upper-triangular and lower-triangular formats.

Various problems in Stockholm format (used by HMMER and PFAM) have been identified and resolved.

2.3 SoapLab

Typed services were using the JAX-WS stack. Mahmut now has typed services sharing the same web application as Axis services and deployed on the development server. Improvements to the XSD and WSDL definitions include short application descriptions in the WSDL file. Most methods are now implemented, except 'waitfor' and 'getresults' for partial results. The current namespace in the result XML is broken but it easy to fix.

Documentation is being updated with a new introduction, a description of EMBRACE-compliance, other SOAP services for EMBOSS (Jemboss, WsEmboss).

A beta release of the typed services will be made available soon.

Peter has added an acdxsd utility with stubs to generate XSD sections for each input and output datatype and other qualifiers. Sequences can use a general include. Values need to be determined for required qualifiers. For outputs we will assume SoapLab will enforce a standard format. acdxsd will be Soaplab-specific unless other users request alternatives. A new output format of DASGFF could be used for features and reports. We need to decide on a similar format for sequence and alignment outputs from SoapLab.

3. Administration

Alan has set up a new EMBOSS wiki at Open Bio. He will work through the administrator documentation. It is based on MediaWiki. Candidate pages for the wiki include proposed new applications and features, and a detailed set of GCG replacement applications.

4. Documentation and Training

4.1 Books

Alan has been proofreading the latest drafts.

Peter has been working on autogenerating text sections.

4.2 Training

The proposed Madrid course dates are not yet known, but probably not until Spring.

5. User queries and answers

See above for discussion of revseq features.

6. AOB

None

7. Date Of Next Meeting

The next meeting is on Monday 19th January.