|
EMBOSS: Project Meeting (Mon 26th April 10)
|
Attendees
EBI:
Peter Rice,
Alan Bleasby,
Jon Ison,
Mahmut Uludag
Visitors:
Apologies:
1. Minutes of the last meeting
Minutes of the meeting of 19th April 2010 are
here.
2. Maintenance etc.
2.1 Applications
Peter has modified showalign to display ticks and
numbers relative to any selected reference sequence in the input
alignment. The tick marks and numbering ignore any gaps within the
sequence. Gaps at the beginning have 'v' and 'V' as minor and major
tick marks, and are numbered from -1. Gaps after the end similarly are
numbered from +1. The reference sequence name is case-insensitive,
assuming only one sequence matches.
Mahmut has checked the consistency of sequence set and sequence
stream (seqall) usage in command lines. There is no standard ordering
of these in ACD. Details are on the Wiki and in an email.
Mahmut is looking into sequence matching methods. It could be
useful to add a -minscore to other
applications. wordmatch with alternative alignment formats will
report a comparison matrix but there is no option to select the matrix
it will use for scoring. A general matrix option may be
useful. Peter will check whether gap penalties are similarly
reported for wordmatch, although it only generates ungapped
alignments they may appear in some output format headers.
In checking on SSAHA, Mahmut noticed that it automatically
aligns to both strands. We can consider adding an option to our
alignment methods to try both strands for the best score.
Mahmut found a bug in supermatcher which misses some
alignments by skipping one sequence position. The bug is simple to
fix, but would also be covered by replacing the matching method by the
Rabin-Karp algorithm.
2.2 Libraries
Alan has produced a DLL for plplot to replace the static
libraries for mEMBOSS.The DLL takes all the .c and .h files in plplot
and adds in any .cpp or .h files in the same directory. A Visual
Studio project is used to produce the DLL. The builds for other
projects are altered to link to the DLL. As the DLL is built from the
latest source, any changes should be immediately noticeable on
Windows. Upgrading to a new plplot release in future should now be
easier. A test mEMBOSS is available through Alan's personal web pages.
Alan has updated the Windows install to pass EPLPLOT_LIB as an
environment variable. In Jemboss the properties file has "plplot"
defined, but it is not clear whether this is used.
Peter has added "PDF" as a device for plplot. This required
adding one more plplot source file, but also linking to an external
library "libharu" which plplot uses to write PDF files. The libharu
library is installed on the EMBOSS machines. Alan will update
the configure procedures to test for libharu and define PLD_pdf if it
is found.
Peter plans to test SVG graphics, but for this plplot uses Qt4
which is too large to consider providing separately. Another configure
test will be needed. A windows version is available. Alan noted
that any static library is dependent on the Visual Studio version that
created it. We also probably need to commit any header files needed
for compilation of plplot.
Mahmut is working on SAM output format for alignments,
aiming to reproduce the alignment output of SSAHA. Some of the SAM
format fields can be ignored by leaving them empty.
Alan reported that the fix for zlib library clashes in
CVS is difficult to patch as this would require new distributions for
each of the EMBASSY packages. Instructions will be provided to users
experiencing problems to install with a prefix rather than using the
default system location for the ezlib "zlib.h" file.
2.3 SoapLab
Mahmut is generating SAWSDL annotation in a parallel version of
the service WSDL, and has a test server running.
3. New developments
3.1 BioMart access
Peter is working on new attributes for sequence database
definitions to support BioMart access. There is an issue to resolve in
providing text for entret to display. The plan to generate a
FASTA format sequence is insufficient for this. A possible solution is
to generate a novel name-value format with the first record showing the
identifier and with some way to mark the sequence identifier and the
end of the entry.
3.2 EDAM
Jon has completed a clean up of the data branch, including most
of Matus's suggestions. It is now much easier to navigate, with fewer
top level terms. We still need to decide where to put individual
databases. The current priority is the re-annotation of ACD files
using the term IDs and types to support SAWSDL generation in time for
the workshop in Amsterdam next month.
4. Administration
Alan reported that EBI systems group are building a system to
test the I/O and network bottlenecks of database indexing and large sequence
file processing.
Alan reported that Fedora 13 is due for release in about 1 month's time.
5. Documentation and Training
5.1 Books
Peter looked into validation of URLs in the book text. There
are two types of URL. Some appear in the book text and need
checking. Others are links that will only appear in the HTML (website)
version. For these we need to define where the web page will be as
there are relative paths in the linked URLs. The book text is in a
single file.
5.2 Documentation
None.
5.3 Training
None.
6. User queries and answers
All outstanding queries put on the Sourceforge tracker.
7. AOB
Peter submitted an abstract for the BOSC meeting. He will also
submit a late poster for ISMB.
8. Date Of Next Meeting
May 3rd is a public holiday. The next meeting will be on Monday 10th May.