EMBOSS: Project Meeting (Mon 26th April 10)


Attendees

EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag
Visitors:
Apologies:

1. Minutes of the last meeting

Minutes of the meeting of 19th April 2010 are here.

2. Maintenance etc.

2.1 Applications

Peter has modified showalign to display ticks and numbers relative to any selected reference sequence in the input alignment. The tick marks and numbering ignore any gaps within the sequence. Gaps at the beginning have 'v' and 'V' as minor and major tick marks, and are numbered from -1. Gaps after the end similarly are numbered from +1. The reference sequence name is case-insensitive, assuming only one sequence matches.

Mahmut has checked the consistency of sequence set and sequence stream (seqall) usage in command lines. There is no standard ordering of these in ACD. Details are on the Wiki and in an email.

Mahmut is looking into sequence matching methods. It could be useful to add a -minscore to other applications. wordmatch with alternative alignment formats will report a comparison matrix but there is no option to select the matrix it will use for scoring. A general matrix option may be useful. Peter will check whether gap penalties are similarly reported for wordmatch, although it only generates ungapped alignments they may appear in some output format headers.

In checking on SSAHA, Mahmut noticed that it automatically aligns to both strands. We can consider adding an option to our alignment methods to try both strands for the best score.

Mahmut found a bug in supermatcher which misses some alignments by skipping one sequence position. The bug is simple to fix, but would also be covered by replacing the matching method by the Rabin-Karp algorithm.

2.2 Libraries

Alan has produced a DLL for plplot to replace the static libraries for mEMBOSS.The DLL takes all the .c and .h files in plplot and adds in any .cpp or .h files in the same directory. A Visual Studio project is used to produce the DLL. The builds for other projects are altered to link to the DLL. As the DLL is built from the latest source, any changes should be immediately noticeable on Windows. Upgrading to a new plplot release in future should now be easier. A test mEMBOSS is available through Alan's personal web pages.

Alan has updated the Windows install to pass EPLPLOT_LIB as an environment variable. In Jemboss the properties file has "plplot" defined, but it is not clear whether this is used.

Peter has added "PDF" as a device for plplot. This required adding one more plplot source file, but also linking to an external library "libharu" which plplot uses to write PDF files. The libharu library is installed on the EMBOSS machines. Alan will update the configure procedures to test for libharu and define PLD_pdf if it is found.

Peter plans to test SVG graphics, but for this plplot uses Qt4 which is too large to consider providing separately. Another configure test will be needed. A windows version is available. Alan noted that any static library is dependent on the Visual Studio version that created it. We also probably need to commit any header files needed for compilation of plplot.

Mahmut is working on SAM output format for alignments, aiming to reproduce the alignment output of SSAHA. Some of the SAM format fields can be ignored by leaving them empty.

Alan reported that the fix for zlib library clashes in CVS is difficult to patch as this would require new distributions for each of the EMBASSY packages. Instructions will be provided to users experiencing problems to install with a prefix rather than using the default system location for the ezlib "zlib.h" file.

2.3 SoapLab

Mahmut is generating SAWSDL annotation in a parallel version of the service WSDL, and has a test server running.

3. New developments

3.1 BioMart access

Peter is working on new attributes for sequence database definitions to support BioMart access. There is an issue to resolve in providing text for entret to display. The plan to generate a FASTA format sequence is insufficient for this. A possible solution is to generate a novel name-value format with the first record showing the identifier and with some way to mark the sequence identifier and the end of the entry.

3.2 EDAM

Jon has completed a clean up of the data branch, including most of Matus's suggestions. It is now much easier to navigate, with fewer top level terms. We still need to decide where to put individual databases. The current priority is the re-annotation of ACD files using the term IDs and types to support SAWSDL generation in time for the workshop in Amsterdam next month.

4. Administration

Alan reported that EBI systems group are building a system to test the I/O and network bottlenecks of database indexing and large sequence file processing.

Alan reported that Fedora 13 is due for release in about 1 month's time.

5. Documentation and Training

5.1 Books

Peter looked into validation of URLs in the book text. There are two types of URL. Some appear in the book text and need checking. Others are links that will only appear in the HTML (website) version. For these we need to define where the web page will be as there are relative paths in the linked URLs. The book text is in a single file.

5.2 Documentation

None.

5.3 Training

None.

6. User queries and answers

All outstanding queries put on the Sourceforge tracker.

7. AOB

Peter submitted an abstract for the BOSC meeting. He will also submit a late poster for ISMB.

8. Date Of Next Meeting

May 3rd is a public holiday. The next meeting will be on Monday 10th May.