EMBOSS: Project Meeting (Mon 5th March 2007)
Peter has found and fixed a problem in phylipnew with the frestdist program. This failed in valgrind. The native phylip application restdist gave the same problem in valgrind. A fix was found in phylip 3.66 code, and copied into the EMBASSY program. Some time before 5.0.0 in the summer the remaining 3.66 changes should be merged into phylipnew. Note that the program output has changed (so both were giving incorrect results before for this particular input).
Peter reported that AJAX and NUCLEUS library code is fully updated, apart from one minor problem in the current CVS code ajStrFromFloat function, noted by Guy Bottu, which must be corrected.
Peter has updated the EFUNC and EDATA databases - on the public SRS server and also internally so getz can be used to check for unused functions.
Mahmut is working on a JSP page to list the databases defined for EMBOSS applications running under the SoapLab server. The servlet calls showdb and uses the HTML output.
Error pages are improved. It is not easy to link exactly to the error. The standard "unable to read sequence" error produces 6 lines of help text with links to the database page (see above), USA syntax and documentation of the actual service on the EMBRACE wiki. Implementing this needs a server restart, which is not possible at present - the services are continuously in use.
Jon reported that a user in Oxford has requested web service access to some domainatrix applications that are not yet included in SoapLab. The issue is that these have "directory" and "outdirectory" ACD types. For SoapLab, it makes no sense to simply parse a directory name. Mahmut will investigate passing a set of files and storing them in a new named directory, so that the directory name can be passed to the application. Similarly, output could be returned from a new output directory. Peter recommended using the qualifier name in the directory naming to avoid conflicts. An alternative is to change the application to read a list of filenames, but this will upset all current users.
Shaun has modified the existing Jemboss parser to iterate over all ACD files in a directory, and to generate a separate schema for each file using the qualifier names and the attributes in each instance. Each application has a separate namespace, for example emboss/applications/needle. Peter suggested including nucleotide and protein namespaces for applications like needle that can read any sequence, especially those that reference $(acdprotein) in the ACD file as they behave differently on protein input.
Once the attributes are captured, a java bean will be automatically generated to deserialise an input object containing user inputs, producing a clean EMBOSS command line.
We can make a release 4.1.0 this week. Peter will confirm when the QA and memory tests all pass.
Alan has tested compilation on Microsoft Visual C++ and bundled a test distribution of 4.1.0. The issue in release 4.0.0 with the user path being overwritten is believed to be fixed. This was a problem with the third party setenv utility (version 1.0) which has a new release. We were unable to reproduce the original problem - partly because nobody volunteered to play with their Windows installation.
The build uses the VC++ 2005 version. This requires run time libraries for users who do not have them installed with some other package. For release 4.0.0 users were pointed to Microsoft to download the RTLs. It was agreed that for 4.1.0 we can provide the current RTL installation, and point users to Microsoft for the latest release. Using the InnoSetup installation package to manage the RTL could be difficult to support.
Jon has completed markup of the admin chapter. Style guides are available in the EMBOSSDOC guide. Style information can also be gleaned from comparing the existing documents. The most important issues are style and table definitions where a subset of DocBook is used.
Jon has also converted the EMBOSS tutorial. Some updating is needed, for example the XLRHODOP entry is now only available by accession number.
Peter has updated some of the standard documentation pages on the website to fix issues raised by EBI external services who have been thoroughly reviewing the applications and documentation. The PIR sequence format example now shows a protein sequence. Documentation of PIR format (and NBRF database format) was no longer available on the net. A copy was provided by John Garavelli and is now included in the EMBOSS pages. These pages will also need to be updated in the books. A scan of recently modified pages will find them. This will be a good test for how easy the book text is to automatically update.
Peter has proposed helping Cambridge University with their user training.
Jon noted that we have no user training specialist now that Lisa has left. We can discuss with her replacement at EBI (Vicky Schneider) when she arrives.
Jon reported on the status of Mike Hurley's code for structure alignment using a dynamic programming two-step algorithm comparing patterns of physical residue-residue contacts.
The algorithm runs relatively slowly and has only been used on small use cases. It is a possible candidate for testing EMBOSS on high performance computers or grids.
The next meeting is on Monday 19th March.