EMBOSS: Project Meeting (Mon 25th Jul 11) |
Minutes of the meeting of 11th July 2011 are here.
Peter will update the web sites and announce to the Advisory Board, and to the emboss-dev list, once library documentation pages have been cleaned up.
In running mEMBOSS QA tests about 50 tests failed. Most were due to test databases such as tdas not defined in the test environment.
SoapLab is running smoothly at the London Data Centres. A monitor service needed some tidying up. It runs via LSF and so takes time to return a result. The cluster load balancers need a faster response time so Mahmut is working on a "health check" servlet to test the availability of a node, and looking into Log4J.
Reading data could use the concept of screen resolution to limit the amount of data loaded into memory, with detail at the sequence level only needed for a short region of the reference. Reference sequences need start and end positions defined before the input is read, which may require a new ACD input type.
Many of the data formats use BAM style indexing to allow remote access (FTP or HTTP) to a small region of the input file.
Michael suggested a chunked approach to reading sequences which can be needed in some genomes, for example the Opossum genome which is defined in CON files referring to individual sequence entries.
Mahmut has looked at Mira assembly format as a way to load assembled read data.
Michael suggested useful formats include VCF which is compressed to hold a large number of variants, and dbSNP.
Jon proposed BioXSD as a new supported format which is now becoming stable. It would help is we can match BioXSD types to EMBOSS data structures.
Mahmut noted we should add further EDAM annotations to some of
the databases defined in emboss.standard
Alan noted there had been problems printing from the EMBOSS
machines. These resolved themselves after some delay.
Mahmut reported some problems in using the updated eclipse in
Fedora 15. These were issues with Maven when working on SoapLab. The
issues are now fixed.
Jon is ready to update the XML book source files. Alan
will create a CVS branch to preserve the original source files for
the first editions. Updates will be used to maintain the new web site
pages.
Jon will document how to generate the HTML web pages from the
books using XMLmind. Stylesheets may need some adjustment for table
formats. Peter suggested a simple Perl post-processor as an
alternative if style sheets are tricky to manage.
Peter needs to generate new sections for the new data types in
the latest release. It should be possible to generate the library
descriptions by incorporating additional book text into the source
code "section" documentation blocks.
Alan is looking into a user report of a small conflict between
some tcode results and the original paper.
4. Administration
Alan reported Linux has now declared a release 3.0.0.
5. Documentation and Training
5.1 Web server
Alan reported that the new emboss.open-bio.org web site is available.
Peter has loaded the release 6.4.0 and latest CVS documentation
for applications and library documentation.
Alan has contacted Open-Bio about enabling server-side includes
on the new web server.
5.2 Books
Peter reported the books were the highlight of the Cambridge
University Press stand at ISMB in Vienna.
6. User queries and answers
All done.
7. AOB
Alan reported a query from Tim Carver on handling of large
translation attributes in GFF3 format. Peter will investigate.
8. Date Of Next Meeting
The next EMBOSS meeting will be on Monday 1st August.