EMBOSS: Project Meeting (Mon 12th October 09)


Attendees

EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag
Visitors:
Apologies:

1. Minutes of the last meeting

Minutes of the meeting of 5th October 2009 are here.

2. Maintenance etc.

2.1 Applications

Mahmut has reimplemented the alignment code for needle using the compass array to avoid backtracing. This adds space complexity but runs faster. We already recommend the alternative application stretcher if needle runs out of space.

Implemented as needleall for short read inputs the construction of alignment output objects takes a third of the run time. The alignments could be made optional with a log file containing only scores or a list of matches reported by default to improve performance, but 'align' output is useful as the users may wish to see the alignments to check results. An additional 10% of run time is used to clear and reallocate the newly introduced matrices. This could be avoided easily by initialising the new matrices outside the main loop and resizing them as needed.

Mahmut will commit data files and a QA test for needleall.

Jon has committed acdrelations and associated data files with EDAM terms for all outputs, and revised the relations attributes in ACD files.

2.2 Libraries

Peter has committed the updated PCRE code (PCRE version 7.9) and the relocated ajacd and ajseqdb code.

Alan has tested the reorganized library code on both Fedora and Windows.

Alan has cleaned up compiler warnings on Windows in the library and application code.The bundlewin has been modified to use the reorganized AJAX directories, and has a list of executables to ignore as dbxreport and dbxstat are not yet supported by the committed indexing library code.

Mahmut reported on analysis of the alignment code. Kevn Karplus's Bioinformatics Course Page proposes an additional "double_gap" penalty for switching from a gap in one sequence to a gap in the other. It was decided to ignore this (regard the additional penalty as zero) unless a user requests it.

Current DOM library code is based on domc. Alan will archive the latest version in case we have an issue with future releases.

2.3 Other

Mahmut has updated the Jemboss makefiles, relocating the ant "package" call so that the jemboss jar file is prepared before it needs to be copied. Class files are deleted from the distribution tarball to make it cleaner and smaller. Make and make install now work cleanly, fixing a problem reported by the Debian team.

3. New developments

Peter has committed a corrected version of ajsql with all documentation warnings corrected. The ensembl library code had around 6000 warnings. Some 1500 have been corrected by working through conflicts in the documentation headers and the function prototypes, and by renaming the first level of each function to be a single name (one capital letter) matching the datatype name. There are similar issues at lower levels, for example GetXxxxx functions where the last [art of the name is an Ensembl attribute that should be converted to a single leading capital letter. These need converting by hand to check the interpretation of the name is correct.

Peter has also found that in the ensembl code some datatypes can be safely passed as read-only 'const' objects, but others may in some cases be modified. Often this involves creating a reference-counted copy which has to update the reference count and therefore is updating the object. These should be reviewed. One solution would be to allow a new code for read-only reference objects to avoid the error message. Peter will check the Ensembl test applications for renaming of datatypes and functions.

Jon reported on EDAM developments. UCL have produces a list of term names needed for CATH service annotation. WhatIf annotation so far uses only terms in the PDBML data schema. These will be implemented for now as a best guess at the terms WhatIf needs.

Jon is looking into OWL conversions of OBO data to provide additional validation of the design rules. Any extensions made in OWL must be convertible without loss to OBO.

EDAM will follow the principles of a "standard upper ontology" such as SUMO, which includes restrictions on mixing semantic types.

Jon will designate the first EDAM release as "0.5 alpha".

Jon has passed details of EMBOSS internal data structures to Matus for inclusion in the design of BioXSD.

4. Administration

Alan has a reply from Michael Schuster requesting an account on the CVS server.

Alan has reinstalled XP on the remaining old workstation so all 3 now run the same version.

Peter reported that the new Thunderbird 3b4 release is causing serious problems. He will avoid using it until an update becomes available.

Jon's dual monitors are working well after a recent GNOME update.

For BioCatalogue support, Perl scripts for service tests need to check the version of XML::Compile (a recent modification by Hamish). Submission to the EMBRACE registry requires installation of a python module (possibly an older version of the module).

5. Documentation and Training

5.1 Books

Jon had a reply form the publishers who are happy with the Word document conversion from DocBook.

A new deadline of 24th December was agreed for the final version of the book text.

Additional book tasks are to design a logo (Jon will work on some ideas) and to design a book cover (Peter will work on variations of the "fridge magnet" theme).

5.2 Website

Web pages will be redesigned from the book text. Jon will work on a front page design.

6. User queries and answers

A user has contributed a patch to water to align a sequence stream with a sequence set. Peter will test and reply.

7. AOB

Peter reported interest form a recent Grid Computing meeting in developing grid-based EMBOSS services with updated databases.

Mahmut and Peter will attend an EBI workshop on next generation sequencing in 2 weeks.

Mahmut will attend next week's EBI course on access to ensembl genomes, genotype data and other EBI resources.

8. Date Of Next Meeting

Peter is away next week at a workshop. The next meeting will be on Monday 26th October.