EMBOSS: Project Meeting (Mon 23rd November 09)
1. Minutes of the last meeting
Minutes of the meeting of 9th November 2009 are
2. Maintenance etc.
Peter is cleaning up the cirdna and lindna graphical
applications (see below).
Mahmut is looking further into vectorstrip extensions
for adaptor contamination of Illumina sequence data.
Peter has removed the 4 extra plplot functions from the
ajgraph.c source file. One was a function to set the title of an X11
or win3 window, replaced by a plplot API function. A second was a
function to return the internal filename used. Device settings added
to the ajgraph.c internal definitions allow the ajgraph.c code to
derive all the output filenames that plplot will have written. The
other two functions returned character height and text string width in
"world coordinates" the coordinates used for plotting. Unfortunately
this is not a conversion available in the native plplot API. These
have been replaced by functions returning the height and width in
millimetres. The two applications using these functions, cirdna and
lindna, need to be rescaled to plot in millimetre units.
Peter noted that the GIF driver was failing to create multiple
plots. This will be fixed in CVS and in the next release. No users
have reported this problem - it is reasonable to assume they are still
using PNG output which does correctly set multiple files. The setting
is currently using a modified plplot, but should use a plplot API call
The current mEMBOSS build uses plplot static library code. Later
versions of plplot (5.9.5 is the most recent) can be tested by
modifications to configure.in and Makefile.am.
Mahmut has looked into pattern matching in embpat.c and
is testing the Rabin-Karp algorithm for use with multiple
patterns. This is a useful option for Illimuna adaptor detection.
Mahmut has applied fixes to the emboss4 services. A large
number of hits recently in a short period ran into a known bug already
fixed in the latest services. A second fix was applied to report only
relative directory paths.
Loaded classes continue to increase. Checking with the Eclipse memory
analyser suggested a memory leak as a result of using the
File.deleteOnExit method. This method was not intended to be used by
servers. SoapLab was writing the temporary files in a tomcat temporary
directory. Thsi is now moved to the SoapLab job directories. The fix
was tested on emboss4 services and will be applied to the current
Alan has tested a mEMBOSS build on Windows 7. There was a
problem running this build under Vista on emboss2 as the manifest
files required the vc90 files from Alan's machine rather then the new
Windows 7 download of Visual Studio Express. The cause was a new
requirement for a .NET installation when using the DLLs supplied by
Microsoft. The older DLL files had no such dependency.
Peter has built a patch for the extractfeat problems. Further
testing is needed to check FASTQ formats work as documented in the
latest paper. Mahmut will check for other bugs to be fixed in
3. New developments
3.1 BioMart access
Alan had a meeting with Syed Haider to discuss BioMart access
from EMBOSS. SQL access is considered obsolete by BioMart. The
recommended access route is through the REST service API to download
sufficient detail to derive a set of databases, fields and
attributes. No new library code is needed. Expat will still be
required for interpretation of XML, and zlib will be needed in next
generation sequence input for BAM file formats.
The BioMart server can send its results in batches. We cannot allow a
timeout while merely waiting for the server. There is an html switch
to send a completion flag at the end of the results.
Alan will see whether the existing HTTP access methods (Entrez, SRS and
SRSWWW) can be used as a template for BioMart.
Jon has updated EDAM to conform to the new documented standards
with a simpler structure. Fields and tools are now well defined in
the ontology. The current ontology has 1666 terms. A further cleanup
may reduce this by 25%.
The ELIXIR survey databases have been assigned to a set of categories,
and the Nucleic Acids Research categories have been merged in.
EDAM will be presented at the BioCatalogue annotation jamboree as a way to
annotate databases and datatypes.
Mahmut is looking into suffix trees and suffix arrays for
string pattern matching in large sequences and large sequence sets.
The bwa assembler code (successor to maq) uses many efficiency
enhancements leading to complex code. Other code investigated includes
the velvet assembler. There are also many code examples not connected
to bioinformatics, especially in antivirus software for virus
Alan now has 2 more powerful machines available. Windows 7 has
been installed and tested. It requires signed libraries. We can update
the new machines through Dell but need to make sure any upgrade does
not overwrite the master boot record. Peter will arrange this with the
systems group, together with Microsoft office installation on emboss6.
Alan will install Fedora 12 on all machines this week.
Alan asked about making /shared available on machines through
the autofs "ghost" option. He will do this as part of the fedora 12
Peter circulated a list of installed packages on /shared/software.
Alan will look into setting up a web server (or perhaps a wiki)
on emboss5 to document the /shared software and data installations.
5. Documentation and Training
Alan will update the AJAX library structure in the admin manual.
6. User queries and answers
Peter summarised the next generation sequencing congress in
London last week. A key requirement for users appears to be help with
de novo assembly using multiple instruments. Most applications assume a
single set of data. User needs vary enormously depending on the nature
of the experiment.
The BioCatalogue annotation jamboree is next Friday in
Manchester. Jon and Mahmut will attend.
8. Date Of Next Meeting
The next meeting will be on Monday 30th November.