EMBOSS: Project Meeting (Mon 12th October 09) |
Implemented as needleall for short read inputs the construction of alignment output objects takes a third of the run time. The alignments could be made optional with a log file containing only scores or a list of matches reported by default to improve performance, but 'align' output is useful as the users may wish to see the alignments to check results. An additional 10% of run time is used to clear and reallocate the newly introduced matrices. This could be avoided easily by initialising the new matrices outside the main loop and resizing them as needed.
Mahmut will commit data files and a QA test for needleall.
Jon has committed acdrelations and associated data files with EDAM terms for all outputs, and revised the relations attributes in ACD files.
Alan has tested the reorganized library code on both Fedora and Windows.
Alan has cleaned up compiler warnings on Windows in the library and application code.The bundlewin has been modified to use the reorganized AJAX directories, and has a list of executables to ignore as dbxreport and dbxstat are not yet supported by the committed indexing library code.
Mahmut reported on analysis of the alignment code. Kevn Karplus's Bioinformatics Course Page proposes an additional "double_gap" penalty for switching from a gap in one sequence to a gap in the other. It was decided to ignore this (regard the additional penalty as zero) unless a user requests it.
Current DOM library code is based on domc. Alan will archive the latest version in case we have an issue with future releases.
Peter has also found that in the ensembl code some datatypes can be safely passed as read-only 'const' objects, but others may in some cases be modified. Often this involves creating a reference-counted copy which has to update the reference count and therefore is updating the object. These should be reviewed. One solution would be to allow a new code for read-only reference objects to avoid the error message. Peter will check the Ensembl test applications for renaming of datatypes and functions.
Jon reported on EDAM developments. UCL have produces a list of term names needed for CATH service annotation. WhatIf annotation so far uses only terms in the PDBML data schema. These will be implemented for now as a best guess at the terms WhatIf needs.
Jon is looking into OWL conversions of OBO data to provide additional validation of the design rules. Any extensions made in OWL must be convertible without loss to OBO.
EDAM will follow the principles of a "standard upper ontology" such as SUMO, which includes restrictions on mixing semantic types.
Jon will designate the first EDAM release as "0.5 alpha".
Jon has passed details of EMBOSS internal data structures to Matus for inclusion in the design of BioXSD.
Alan has reinstalled XP on the remaining old workstation so all 3 now run the same version.
Peter reported that the new Thunderbird 3b4 release is causing serious problems. He will avoid using it until an update becomes available.
Jon's dual monitors are working well after a recent GNOME update.
For BioCatalogue support, Perl scripts for service tests need to check the version of XML::Compile (a recent modification by Hamish). Submission to the EMBRACE registry requires installation of a python module (possibly an older version of the module).
A new deadline of 24th December was agreed for the final version of the book text.
Additional book tasks are to design a logo (Jon will work on some ideas) and to design a book cover (Peter will work on variations of the "fridge magnet" theme).
Mahmut and Peter will attend an EBI workshop on next generation sequencing in 2 weeks.
Mahmut will attend next week's EBI course on access to ensembl genomes, genotype data and other EBI resources.