EMBOSS: Project Meeting (Mon 28th March 11)


Attendees

EBI: Peter Rice, Alan Bleasby, Jon Ison, Michael Schuster
Visitors:
Apologies: Mahmut Uludag,

1. Minutes of the last meeting

There was no meeting on 21st March 2011.

Minutes of the meeting of 14th March 2011 are here.

2. Maintenance etc.

2.1 Applications

Alan has fixed a crash in restrict when the list of enzyme names was empty. The main fix is to silently return no results in embPatRestrictMatch, with an additional fix in ajStrParseCount to correctly return zero rather than '1' for an empty or NULL input string.

Alan has updated primer3 to report the primer length.

Jon suggested reviewing the EMBASSY packages and updating them for the release. HMMER 3.0 is an obvious example. The interface appears similar to the latest HMMER2 wrapping, though the internals are very different.

Peter noted that ViennaRNA has been updated. Other packages should also be checked for more recent versions.

Jon noted that the next Clustal release may need a rewrite of emma or a new EMBASSY package if the interface has significantly changed.

2.2 Libraries

Peter has implemented reference counting for AjPTable and AjPList objects. The reference counts are decremented on deletion, and the object physically deleted when the reference count reached zero. Functions to clear or reset objects issue warning messages if the reference count is set. Applications should take care to ensure they have a unique copy before calling these functions. Alan fixed some remaining memory leaks in ajdom.c

Peter is implementing multi-location features (e.g. joins in EMBL) as sub-features. This will simplify feature sorting, and the reporting of hierarchical feature outputs such as DAS. By removing he need for the CHILD flag, we may be able to avoid using the feature flags special tag in the great majority of practical cases. Peter will test for compliance with the GFF3 features format. EMBOSS uses the same syntax as GFF2 for tag values. These should be unquoted, with commas (sadly) and some other values escaped. Only a few extra functions are needed. GFF3 also uses specially named tags Parent, etc. (each with an upper case first letter). These will be implemented as a separate set of tags in the feature object as storing them as 'note="*Parent..."' is complicated by the case-sensitive naming. We may use this syntax to preserve them in EMBL/GenBank format.

2.3 mEMBOSS

Alan has recently rebuilt mEMBOSS. The latest build uses 64-bit values for ajlongs. Any code changes affecting the EMBOSS build before the release should be highlighted as soon as possible.

Peter suggested implementing a mEMBOSS version of the QA tests. This should be only for development, and will require a Windows Perl installation and modifications to the qatest.pl script to handle Windows directory paths in configuration, commands, and testing output file contents.

2.4 Other

Michael suggested updating the java configuration for developers to test for java and for ant. Alan noted that the release configuration differs from the developer configuration. Comparing to other packages found no standard way to test for java releases.

Alan is considering adding pkg.m4 to the m4 macro directory to test for the pkg_config utility. This macro file has been stable for some years. Although this will cope with standard systems, there will be possibilities for modified systems to have problems installing EMBOSS and pkg_config in different locations. This is needed for axis2c at present but may soon have more general uses.

Michael suggested making EMBOSS applications into a callable library. Peter will investigate. The coding effort is not large as this would be an interface that calls the application. The effort would be in providing an interface to the application options before launching with the command line built, or options provided in some other (new) form.

3. New developments

3.1 Access methods

3.1.1 BioMart
Alan has implemented, but not yet committed, a form of caching for biomart registry and server information. Users or sites can have a cache directory with bit flags to control the creation of information about a server, and whether XML or tab-separated information is held. When accessing data through BioMart the library code can look in the cache, read directly from the server, or check the cache and then if that fails then read directly.

A new attribute cachedirectory has been added to the server attributes in ajnam and to the ajNamServerDetails function.

The cache directory could be in a standard location (for example a server-name subdirectory of the user's .embossdata directory) or an environment variable could be used. This is yet to be decided.

Michael us using ajNam calls to read server and database attributes and is generating the server cache file automatically in a showensembl test application without needing cache directories.

Michael suggested checking for Etags values in HTTP headers to find server or data version numbers. This is available for Ensembl but the values are not provided by BioMart. Alan will ask whether the BioMart team will implement them.

Michael noted that the BioMart at OCRI will be updated soon. Alan is expecting a future switch from tab-separated to XML files and has a message in the code to notify when the tab-separated option disappears.

3.1.2 Ensembl
Michael has committed updates to the Ensembl access methods to support Ensembl 61, with a few updates to genetic variation still required. Most of the changes are to support circular sequence slices where the end position can be grater than the sequence length to wrap around past the start.

Peter plans to make similar changes to EMBOSS libraries to support circular sequences. There has been no user demand so this was a low priority. The EMBL feature table uses joins for these cases, but a syntax allowing the Ensembl style would be easy to implement

Michael will soon start the addition of ensembl 61 features which should be relatively straightforward.

3.2 EDAM

Jon will be attending a Software Ontology (SWO) workshop in Manchester this week. EDAM release beta_12 will be made after the workshop. The major change is the merging of the resource and topic branches into a simplified topic branch. An identifier branch will be introduced in the beta_13 release.

4. Administration

Peter has received feedback from the BBR Fund committee to our first annual report submitted in October. The comments are good and all suggested improvements are in hand, for example updating the website and providing training course which were both waiting for publication of the books.

5. Documentation and Training

5.1 Books

Peter and Alan were contacted by CUP about the books, suggesting a May publication date.

6. User queries and answers

Jon will post some new queries later today.

One user has asked whether EMBOSS could be split into multiple distributions by application domain. This was rejected as it would most probably result in many partial installation. We could in principle make binary distributions of a limited set of individual applications.

7. AOB

Peter will start writing the next grant proposal for core EMBOSS support and development.

BOSC, Bio-Ontologies and ISMB/ECCB in Vienna will be major events this year. Peter and Jon will be attending.

8. Date Of Next Meeting

The next EMBOSS meeting will be on Monday 4th April.