EMBOSS: Project Meeting (Mon 28th September 09) |
Peter will look into possible utility applications for managing short read data files. These include detecting the compatible formats of a fastq file (directly, not by reading using ajSeqRead) and sorting of SAM and BAM format data.
Mahmut is looking into extending supermatcher for use in matching short reads and adaptors and for improved efficiency.
The related BAM format needs an investigation into the code needed to read "BGZF" data - presumably blocked gzip files.
Alan has reorganized the AJAX library split into:
pcre core graphics enseml ajaxseq acdThe reorganization is complete for Unix systems, and for cygwin to build shared libraries by default. This was broken by the libtool 2 change in the last release - AM_LDFLAG was ignored.
Cygwin will complain if a shared library has circular dependencies.
Alan reported that libtool complains if a version value is less than the path number. Vienna's version has been changed from 1.7.2 to 1.7.0 to work around this.
Alan has also modified the Windows library configuration. The bundlewin utility has been modified to handle the new source directories.
Peter will move the ACD functions to the acd/ directory, and the sequence reading/writing functions to the ajaxseq directory. Alan recommended testing on cygwin to catch compile and linking errors.
The knowntypes.standard file is used to match ACD data types to EDAM. Jon will try to match further output file types to describe their semantic content using EDAM terms.
Jon is adding the ELIXIR survey and Swissprot/EMBL cross reference databases to EDAM and checking their associated data types. The Nucleic Acids Research databases and servers issue is a further source of database and data type information.
Jon has reviewed the BioMOBY ontologies and those other ontologies that they reference. Some terms could be reused, of cross-referenced. Many others can be ignored as too specialised. Other EMBRACE partners have offered to help check BioMOBY.
Peter proposed the MIRIAM ontology as a further source of cross-references. Although it probably has no new databases listed, it is used by the minimal information standards and should be cross-referenced to link to EDAM. The Sequence Ontology (SO) is also worth checking although the terms are aimed at a finer detail (e.g. individual sequence feature types) than we generally need in EDAM.
Jon will check other available ontologies before the first release to make sure there are no major architectural issues in incorporating terms and cross-references.
EMBRACE partners are also looking into EDAM to annotate an initiative in Bergen to develop BioXSD, a new attempt to define an overall XML schema for biological data.
Jon will check with BioCatalogue on their supported recommendations for annotation of web services. Peter suggested contacting Paul Gordon for further recommendations.
Jon is interested in trying a paper or poster for a meeting in December.
Jon will write documentation describing the principles and guidelines for adding terms to EDAM, for example distinguishing a PDB record from a PDB file format, and differences between datatypes as categories or as true data types.
Peter will look into the procedures for linking EDAM to the OBO foundry.
Peter and Jon will modify the EDAM validation scripts to ensure that existing term identifiers (numbers) remain fixed so that EDAM can be used by the other EMBRACE partners. The first release with guaranteed consistent term identifiers will be the next one (0.5).
Jon would like a second monitor. We will try to arrange a pair of monitors from our existing group equipment.
Mahmut has experimented with callgrind, an extensin to valgrind that provides coverage reports and a GUI using data from a valgrind run.
Peter proposed aiming for January 15th as the date for the next EMBOSS release. The original November date is too ambitious to complete some of the library, database and application work currently in progress.
Peter will look into possible November dates for our first SAB meeting.