![]() |
EMBOSS: Project Meeting (Mon 25th Jul 11) |
Minutes of the meeting of 11th July 2011 are here.
Peter will update the web sites and announce to the Advisory Board, and to the emboss-dev list, once library documentation pages have been cleaned up.
In running mEMBOSS QA tests about 50 tests failed. Most were due to test databases such as tdas not defined in the test environment.
SoapLab is running smoothly at the London Data Centres. A monitor service needed some tidying up. It runs via LSF and so takes time to return a result. The cluster load balancers need a faster response time so Mahmut is working on a "health check" servlet to test the availability of a node, and looking into Log4J.
Reading data could use the concept of screen resolution to limit the amount of data loaded into memory, with detail at the sequence level only needed for a short region of the reference. Reference sequences need start and end positions defined before the input is read, which may require a new ACD input type.
Many of the data formats use BAM style indexing to allow remote access (FTP or HTTP) to a small region of the input file.
Michael suggested a chunked approach to reading sequences which can be needed in some genomes, for example the Opossum genome which is defined in CON files referring to individual sequence entries.
Mahmut has looked at Mira assembly format as a way to load assembled read data.
Michael suggested useful formats include VCF which is compressed to hold a large number of variants, and dbSNP.
Jon proposed BioXSD as a new supported format which is now becoming stable. It would help is we can match BioXSD types to EMBOSS data structures.
Mahmut noted we should add further EDAM annotations to some of the databases defined in emboss.standard
Alan noted there had been problems printing from the EMBOSS machines. These resolved themselves after some delay. Mahmut reported some problems in using the updated eclipse in Fedora 15. These were issues with Maven when working on SoapLab. The issues are now fixed.
Jon is ready to update the XML book source files. Alan will create a CVS branch to preserve the original source files for the first editions. Updates will be used to maintain the new web site pages.
Jon will document how to generate the HTML web pages from the books using XMLmind. Stylesheets may need some adjustment for table formats. Peter suggested a simple Perl post-processor as an alternative if style sheets are tricky to manage.
Peter needs to generate new sections for the new data types in the latest release. It should be possible to generate the library descriptions by incorporating additional book text into the source code "section" documentation blocks.
Alan is looking into a user report of a small conflict between some tcode results and the original paper.