EMBOSS: Project Meeting (Monday 14th Mar 2011)

EMBOSS: Project Meeting (Mon 14th March 11)

Attendees

EBI: Peter Rice, Mahmut Uludag, Jon Ison, Michael Schuster
Visitors:
Apologies: Alan Bleasby,

1. Minutes of the last meeting

Minutes of the meeting of 7th March 2011 are here.

2. Maintenance etc.

2.1 Applications

Peter has fixed the einverted bug and will now make a patch for this and some sequence format fixes.

2.2 Libraries

Peter described a code cleanup of AjPTable objects.

Functions to provide table operations for query processing required new code to resize the hash table for an AjPTable object. This has been implemented as ajTableResize. This function is also called when a table grows beyond a reasonable number of entries for a given hash table size. The code involves saving the keys and value pointers to arrays, resizing the hash array, and simply repopulating the table (reusing the previous linked list nodes).

Table merge operations can now be implemented as simple operations. Where two tables have the same hash and compare functions, and the same hash table size, keys in the same hash position can be compared. Matching keys can be moved to the start of each linked list, leaving a known list of matching and non-matching keys in each table. These can then be simply processed to leave the resulting merged table and the remaining key/value pairs.

Cleanup of the remaining keys and values would require further code to delete both keys and values and to free table nodes. Mahmut suggested adding destructor functions to the table definition. This very neatly solves the problem by allowing an ajTableDel function to be implemented to clean up any table for which functions are defined.

Peter and Mahmut are working on function names to define or set the destructor functions for tables with standard and user-defined key and value types.

Michael proposed also adding a reference count to the AjPTable object so that copying tables for objects in the Ensembl API could simply increment the reference count. This requires only one new copy constructor for any table key type, and a test of the reference count when a table is deleted.

Michael also proposed a reference count and a data destructor function for AjPList objects.

Peter will implement all these suggestions as soon as possible.

Mahmut is looking into 'bigwig' and 'bigbed' formats.

Alan has fixed a memory leak in DOM parsing when handling doctype metadata. A few minor memory leaks remain in ajdom.

2.3 Other

Michael proposed extending the documentation of data types to allow '@cc' definitions.

Mahmut is testing SoapLab services on the new EBI London Data Centres.

A bug in SoapLab on tomcat 7 has been fixed.

3. New developments

3.1 Database configuration files

Peter has implemented and committed the database and server multiple attributes and the new 'field:' attribute. The documentation is delimited by '!' after the value (taxon id, list of field names). For example:

    field: "sv SeqVersion ! Sequence version or GI number"

The spaces around the delimiter are required. This may be changed to simply stop processing at the delimiter, as parsing only happens when the field (or other) attribute is used for a database. The configuration files are read as lists of unparsed strings.

3.2 Access methods

Michael continues to revise the Ensembl API code to handle circular splices and their features which were new in Ensembl 60. Other Ensembl commitments will have priority.

Constants in the Ensembl API code are no longer defined by macros, making debugging easier.

The Ensembl API is tested by an application which exercises the major functions and writes a FASTA sequence file which is compared to previous versions.

Mahmut has removed an unused function from Eb-eye access.

Mahmut is moving the handling of database identifier, return, filter and accession attributes from AjPSeqin (sequence specific) to AjPQuery (general use as part of AjPTextin).

Mahmut suggested moving SQL access from sequence to text access so that it is usable for features and other data types, for example for CHADO access.

3.3 EDAM

Jon reported on the status of the next EDAM release. EDAM beta 12 is in preparation. Matus Kalas has added many new format terms. The topic and resource branches are being merged as they had too many overlaps. This will become a single topic branch. The 'edamres' lines in DRCAT and DB/server attribute names should become 'edamtpc'

Changes to the data branch and a possible identifier branch will be discussed after this release.

3.4 DRCAT

Jon will commit the latest DRCAT updates later today.

All query elements now have EDAM annotation. Some are quite general and may be extended later. Where necessary, new terms were added to EDAM.

3.5 Other

4. Administration

None

5. Documentation and Training

Michael asked for new documentation headers to be allowed for 'const' and 'conststatic' descriptions of data values in the Ensembl API code. Peter will implement these to reduce the current level of warning messages generated.

6. User queries and answers

All done.

7. AOB

None.

8. Date Of Next Meeting

The next EMBOSS meeting will be on Monday 21st March. Peter will be at a distributed computing meeting.