EMBOSS: Project Meeting (Mon 10th December 07) |
Alan has found and fixed the dbxflat indexing bug that only appears when EMBL release files are processed in a particular order. The problem was in sorting file pointers, but only for some entries in very large input files (over 2Gbytes). A fix will be provided. If indexing succeeded then the index is good so no reindexing is required.
EMBL 93 is due out this week. The corrected dbxflat will be tested on the new EMBL release.
Peter is working through the differences between phylip 3.6 beta and the latest phylip 3.67 to bring the EMBASSY package up to date. Many of the changes to 3.61 were already included in the current PHYLIPNEW release. The code has to be converted rather than wrapped as the interactive menu system is not easy to control.
Alan is working through the differences between ViennaRNA 1.6.4 and the new 1.6.5 release. The code has to be converted rather than wrapped as interactive and batch use give different results.
Peter has made improvements to file input and string handling to provide improved performance. File input uses fewer string handling calls. The string library source code uses more macros and fewer calls to other functions. These changes were suggested by gprof profiling analysis of dbxflat runs.
Peter has fixed a bug in SwissProt output format which produced files that could not be read in as protein. The cause was a change to the ID line for the new SwissProt syntax which was not recognized by the sequence reading functions, resulting in a default to EMBL format and marking the sequence as nucleotide.
Mahmut has fixed a CLASSPATH problem in the Taverna SoapLab plugin. The fix required hardcoding the path for now, but this will be fixed in Taverna.
Mahmut has shown that remote debugging of java processes can be easier than debugging through eclipse for taverna and SoapLab issues.
Alan and Peter will make a patch to include the bugfixes for psiphi, dbxflat and SwissProt sequence format.
Jon, Alan and Peter had a meeting with the book publishers (Cambridge University Press) last week.
Jon has completed the draft for the books and has started the validation and editing stage.
Jon plans tutorials in the use and editing of the DocBook XML sources for the other book authors.
No new queries outstanding.
The UniProt team asked about EMBOSS handling of long lines in sequence databases. The latest release of UniProtKB includes some lines that are longer than 255 characters. Apparently GCG's "embltogcg" application fails on these. Peter confirmed that EMBOSS has no problem and suggested providing a simple script to split long database lines for GCG users as they cannot expect a fix from Accelrys.
December 24th is Christmas Eve. The next meeting is on Monday 7th January.