EMBOSS: Project Meeting (Mon 17th January 11)


Attendees

EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag,
Visitors:
Apologies: Michael Schuster

1. Minutes of the last meeting

Minutes of the meeting of 10th January 2011 are here.

2. Maintenance etc.

2.1 Applications

None.

2.2 Libraries

Mahmut reported memory leaks in the DOM parser when processing comments at the start of an XML file.

Mahmut will move the code for the wsdbfetch access method to ajtextdb.c to make it available to all datatypes.

Mahmut proposed discarding the gsoap library code as EMBOSS has the required functionality without calling this library.

Mahmut will move SOAP access code from the ajax/ajaxdb directory to ajax/core to make it generally available. HTTP access code was moved to ajax/core a few months ago.

Alan has updated ajnam.c to meet the EMBOSS coding standards.

3. New developments

3.1 EMBOSS configuration

Alan is writing code to generate a server cachefile for BioMart. It may be helpful to add a "cachedirectory:" attribute to hold further information.

Mahmut is writing code to generate a server cachefile for dassources. These data sources can make sequence or feature queries. Results can be obtained directly via HTTP rather than the present downloading of a dasgff file. This raises some issues with servers needing both "sequence" and "feature" types.

Peter will check on server configuration error messages e.g. "method not recognised". Some error messages have been cleaned up to only be reported once, so it is possible some return codes need to be tested and new messages added.

Peter will write applications to describe servers (showserver) to assist developers writing cachefile code. Other applications will be needed to describe individual databases and servers in full detail.

3.2 DBX index files

Peter outlined a proposal to compress dbx index files. Each page in the index starts in the first byte and leaves the end of the page untouched. By inspecting each page type it may be possible to identify the end of the data and to pack pages in the index by moving them up. All page references will need to be identified and altered to the new page offsets. It will also be necessary to uncompress index files so that index updating code can be used. Peter will consider the implications and report back next week.

3.3 Text data

Peter has added "text" and "textout" as new ACD datatypes. These allow the entry text to be returned from any database that does not have a type-specific parser. It will be especially useful in combination with definitions in DRCAT.

New application textget returns the text of a type: "text" entry.

4. Administration

4.1 Open-Bio

The large binary index files have been deleted as the CVS server cannot cope.

Alan suggested large files could be served by some other download mechanism, perhaps rsync from the FTP server so that a simple script can be provided for developers to update data and index files in addition to a "cvs update".

5. Documentation and Training

5.1 Books

Alan has sent his amendments to the Admin book. Peter and Jon will review their books this week.

5.2 Other

Jon is writing an EDAM paper.

6. User queries and answers

All done.

7. AOB

Peter suggested ISMB "Technology Track" demos in Vienna for EMBOSS and EDAM. The EDAM talk by Matus in Boston 2010 was well received.

8. Date Of Next Meeting

The next EMBOSS meeting will be on Monday 24th January.