EMBOSS: Project Meeting (Mon 10th May 10) |
Mahmut noted that supermatcher uses a -filter option which applies to the sequence stream input and suggests that this should be standardized as the first input.
Mahmut has further tested needle. A bug in alignment traceback has been fixed and added as a QA test. The application also now runs faster as debug code has been disabled.
Mahmut has validated stretcher which may not agree with the full global alignment from needle. Further tests are needed. The algorithm works in linear space and is documented as having a problem where gap penalties cross segment boundaries. The version of align in the FASTA2 distribution can be used to check the output is consistent with the original version.
SVG output needs no special library. It simply writes an SVG file.
Peter will update ajgraph.c with the default output sizes so that applications can scale plots on the new devices. These have to be explicitly set as usage in plplot varies across device drivers. The figure for SVG will probably be a dummy value.
For Windows, Alan has bundled libhpdf as a DLL which is also committed to CVS and included in the mEMBOSS build.
When saved with CVSNT the DOS files had their end of line formatting converted incorrectly to Mac.
There was discussion of the possible ways to test Jemboss. Some general tests are needed, for example menus, launching clustalw through emma, and graphical output. Mahmut can set up junit tests to be launched from ant. We also need tests of the GUI boxes.
Mahmut has added PDF and SVG graphics output options. Output can be viewed using java desktop, but this only applies to Java 1.6. Possibly the java version could be tested and used to launch java desktop for version 1.6 and some user solution for earlier versions.
Michael now has access to an Intel compiler at Sanger which generates messages for unused static variables from ajdefine.h. There are also type mismatches of integers and longs on 64-bit systems, for example in the ajstr string functions. Alan will review the Intel compiler output.
One compiler warning for 'parameter order undefined' is C++ specific and should be turned off according to Intel.
Michael has found a memory leak in ajStrNewS when called with an empty string.
because this is a sequence database definition, and there can be many sequence attributes and species in a Mart, there could be many possible sequence databases defined.
Peter plans to simplify BioMart access by defining a "server" where the server name is the start of a USA and the rest of the USA could specify the Mart (species), sequence field, and identifier filter.
Mart database definitions can include a "filter" attribute which is an additional BioMart query to, for example, limit the database to a single chromosome within an organism.
Further edits and additions are planned. All ACD files for EMBOSS are re-annotated to the latest EDAM version. For the data definitions only 130 qualifiers had no specific annotation and have to use a tool-specific parameter term.
Sequence types have 2 terms, one for the type (gaped, etc. based on the EMBOSS sequence types) and one for "raw sequence", "sequence record", "sequence set" or "sequence stream". These are nested terms in EDAM. There is also a separate set of feature table terms.
Terms for data formats are now in a separate "syntax" branch to use for file and later for XML formats.
Jon has discussed EDAM with the BioCatalogue team at EBI. There are clear applications in tagging, ontology markup, and replacing existing service categories. This can start once the cleaned EDAM version is announced.
Jon will announce the new EDAM later today. The documentation will need revision.
Peter will make inquiries on pricing.
Alan noted that Fedora 13 is due for release on May 18th. No noticeable changes are anticipated. NFS4 is now the default. There were some beta release issues with opening NFS ports.
Jon will send details of software licenses needed for further EDAM work.
Peter will start planning for the first Scientific Advisory Board meeting.