EMBOSS: Project Meeting (Mon 23rd November 09)


EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag

1. Minutes of the last meeting

Minutes of the meeting of 9th November 2009 are here.

2. Maintenance etc.

2.1 Applications

Peter is cleaning up the cirdna and lindna graphical applications (see below).

Mahmut is looking further into vectorstrip extensions for adaptor contamination of Illumina sequence data.

2.2 Libraries

Peter has removed the 4 extra plplot functions from the ajgraph.c source file. One was a function to set the title of an X11 or win3 window, replaced by a plplot API function. A second was a function to return the internal filename used. Device settings added to the ajgraph.c internal definitions allow the ajgraph.c code to derive all the output filenames that plplot will have written. The other two functions returned character height and text string width in "world coordinates" the coordinates used for plotting. Unfortunately this is not a conversion available in the native plplot API. These have been replaced by functions returning the height and width in millimetres. The two applications using these functions, cirdna and lindna, need to be rescaled to plot in millimetre units.

Peter noted that the GIF driver was failing to create multiple plots. This will be fixed in CVS and in the next release. No users have reported this problem - it is reasonable to assume they are still using PNG output which does correctly set multiple files. The setting is currently using a modified plplot, but should use a plplot API call plsfam.

The current mEMBOSS build uses plplot static library code. Later versions of plplot (5.9.5 is the most recent) can be tested by modifications to configure.in and Makefile.am.

Mahmut has looked into pattern matching in embpat.c and is testing the Rabin-Karp algorithm for use with multiple patterns. This is a useful option for Illimuna adaptor detection.

2.3 SoapLab

Mahmut has applied fixes to the emboss4 services. A large number of hits recently in a short period ran into a known bug already fixed in the latest services. A second fix was applied to report only relative directory paths.

Loaded classes continue to increase. Checking with the Eclipse memory analyser suggested a memory leak as a result of using the File.deleteOnExit method. This method was not intended to be used by servers. SoapLab was writing the temporary files in a tomcat temporary directory. Thsi is now moved to the SoapLab job directories. The fix was tested on emboss4 services and will be applied to the current server.

2.4 Other

Alan has tested a mEMBOSS build on Windows 7. There was a problem running this build under Vista on emboss2 as the manifest files required the vc90 files from Alan's machine rather then the new Windows 7 download of Visual Studio Express. The cause was a new requirement for a .NET installation when using the DLLs supplied by Microsoft. The older DLL files had no such dependency.

Peter has built a patch for the extractfeat problems. Further testing is needed to check FASTQ formats work as documented in the latest paper. Mahmut will check for other bugs to be fixed in this patch.

3. New developments

3.1 BioMart access

Alan had a meeting with Syed Haider to discuss BioMart access from EMBOSS. SQL access is considered obsolete by BioMart. The recommended access route is through the REST service API to download sufficient detail to derive a set of databases, fields and attributes. No new library code is needed. Expat will still be required for interpretation of XML, and zlib will be needed in next generation sequence input for BAM file formats.

The BioMart server can send its results in batches. We cannot allow a timeout while merely waiting for the server. There is an html switch to send a completion flag at the end of the results.

Alan will see whether the existing HTTP access methods (Entrez, SRS and SRSWWW) can be used as a template for BioMart.

3.2 EDAM

Jon has updated EDAM to conform to the new documented standards with a simpler structure. Fields and tools are now well defined in the ontology. The current ontology has 1666 terms. A further cleanup may reduce this by 25%.

The ELIXIR survey databases have been assigned to a set of categories, and the Nucleic Acids Research categories have been merged in.

EDAM will be presented at the BioCatalogue annotation jamboree as a way to annotate databases and datatypes.

3.3 Other

Mahmut is looking into suffix trees and suffix arrays for string pattern matching in large sequences and large sequence sets. The bwa assembler code (successor to maq) uses many efficiency enhancements leading to complex code. Other code investigated includes the velvet assembler. There are also many code examples not connected to bioinformatics, especially in antivirus software for virus signature matching.

4. Administration

Alan now has 2 more powerful machines available. Windows 7 has been installed and tested. It requires signed libraries. We can update the new machines through Dell but need to make sure any upgrade does not overwrite the master boot record. Peter will arrange this with the systems group, together with Microsoft office installation on emboss6.

Alan will install Fedora 12 on all machines this week.

Alan asked about making /shared available on machines through the autofs "ghost" option. He will do this as part of the fedora 12 upgrade.

Peter circulated a list of installed packages on /shared/software. Alan will look into setting up a web server (or perhaps a wiki) on emboss5 to document the /shared software and data installations.

5. Documentation and Training

5.1 Books

Alan will update the AJAX library structure in the admin manual.

6. User queries and answers

None new.

7. AOB

Peter summarised the next generation sequencing congress in London last week. A key requirement for users appears to be help with de novo assembly using multiple instruments. Most applications assume a single set of data. User needs vary enormously depending on the nature of the experiment.

The BioCatalogue annotation jamboree is next Friday in Manchester. Jon and Mahmut will attend.

8. Date Of Next Meeting

The next meeting will be on Monday 30th November.