EMBOSS: Project Meeting (Mon 4th April 11)


EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag, Michael Schuster

1. Minutes of the last meeting

Minutes of the meeting of 28th March 2011 are here.

2. Maintenance etc.

2.1 Applications

Peter reported that applications extractfeat and coderet have been updated to support the new feature/sub-feature data structure (see below). Work is still needed on showfeat.

2.2 Libraries

Peter has implemented an ALIAS configuration for the emboss.standard and emboss.default files. It will also be implemented for server cachefiles to allow server-specific aliases. The alias name can be used in place of a real database name. Error messages helpfully report the original USA in most cases and so far do not need changing.

Peter has implemented the GFF3 "ID" and "Parent" tags as sub-features. Code to read EMBL and GFF3 now uses a sub-feature structure in an AjPFeature object. Other input and output formats ave been updated.

The poorly formatted GFF3 output from previous EMBOSS releases can still be read using the older parsers but may not fully understand the new subfeature structure.

Mahmut is looking for examples of DASGFF feature outputs. The dassources test application can return raw DAS results which can be searched for feature tags. The older DAS 1.5 standard used "Group" tags to define a feature hierarchy. Under DAS 1.6 this should be replaced by "Parent" and "Part" tags but no examples have been found to date from current DAS servers.

Mahmut asked whether failure to read a sequence of a feature table could be set to produce messages explaining the reason for the failure. Peter explained that in many cases code may be testing formats, or testing database retrieval in whichdb, and will need to suppress any error messages. We could set a feature or sequence internal error message for the last known error, but this will need to be done for each "return ajFalse" in the input code to avoid unwanted retrieval of an older message.


Alan has been investigating ways to implement the QA testing for mEMBOSS. There are obvious differences under Windows. QA tests that preset environment variables will need to be modified, with a suggested field for variable settings that can be interpreted differently in Unix and Windows scripts. Pre- and post- processed commands using Unix commands such as cp or rm will need either a Windows-specific version or a prefix with COPY or REMOVE that can be interpreted on each platform. The latter option was preferred. Relative paths need to be converted to absolute paths under Windows. Peter also noted that some of the regular expressions need to be modified to allow Windows output to pass (examples include reported commands and relative paths).

Peter will try installing Perl under Windows to test modifications to the qatest.pl script. Alan has ,looked at "Strawberry Perl". Michael suggested "Active Perl" as an alternative, as it is the most used for Ensembl on Windows.

2.4 Other

Mahmut has updated SoapLab services for the London move and tested using the EMBASSY and EMBOSS QA tests. One application failed when using the URL access method for database retrieval but has been corrected by redefining the database using dbfetch. The services are running a very old EMBOSS version (2.9.0). The mwcontam service failed as SoapLab and EMBOSS use different separators for values. This will be fixed in SoapLab.

The most recent applications are now working, but as initial 'hacks'. The handling of new data types could be improved.

Mahmut has looked into the Junit testing used by Artemis. The current Junit test simply checks a basic application launch.

3. New developments

3.1 Access methods

3.1.1 BioMart
Alan continues to work on BioMart caches.
3.1.2 Ensembl
Michael continues to work on the updates to support Ensembl release 61.

3.2 EDAM

Jon and Matus Kalas attended a Software Ontology (SWO) meeting last week in Manchester. The scope for SWO was agreed, and will include EDAM data and operations terms, plus new terms required by SWO itself. The SWO ontology will be maintained in OWL.

Jon is exploring ways for EDAM to be used by the BioCatalogue project.

Jon and Matus are considering publication options for EDAM.

The next release of EDAM will include the concept of "core datatypes" to distinguish datatype-related information from general parameters. The release will also include a separate branch (name space) for identifiers.

4. Administration

Alan reported that the Open-Bio anonymous CVS server is rebuilt and now available again. The rsync server needs to be announced to developers.

5. Documentation and Training

5.1 Books

5.2 Website

Peter has updated a few obviously outdated pages on the website, including the grant number.

Jon will put up a private copy of the new website generated form the book texts. for testing. The link will be through his home page.

Peter noted the need to check for URLs referenced in the books to make sure they are available as URLs or redirects from the web server at emboss.open-bio.org

6. User queries and answers

All done.

7. AOB

Mahmut attended a next-generation sequencing meeting in Cambridge last week.

Peter will contact the Advisory Board with an update on progress.

8. Date Of Next Meeting

The next EMBOSS meeting will be on Monday 11th April.