EMBOSS: Project Meeting (Monday 11th Oct 2010)

EMBOSS: Project Meeting (Mon 11th October 10)

Attendees

EBI: Peter Rice, Mahmut Uludag, Michael Schuster
Visitors:
Apologies: Alan Bleasby, Jon Ison,

1. Minutes of the last meeting

Minutes of the meeting of 4th October 2010 are here.

2. Maintenance etc.

2.1 Applications

None.

2.2 Libraries

2.3 Other

The server.srs database definitions for the EBI SRS server now include the various UniRef databases clustering UniProt protein sequences by percent identity.

3. New developments

3.1 Axis2C

Mahmut has been testing Axis2C as an alternative way to implement the SOAP protocol in EMBOSS. There were problems in compiling the original source release on the EMBOSS machines, but it worked on other EBI machines.

With the binary release of Axis2C there were problems in linking to EMBOSS.

Axis2C is very large compared to gsoap which only needed one source file and one header file in EMBOSS, but we may be able to use an existing Axis2C installation.

Axis stub generation can be done at install time if we use stubs with dependencies n axis2java.

Mahmut is waiting for a reply from the author of gsoap.

Mahmut also looked at the ajdom source code. There is limited documentation as it has not been used by EMBOSS release code although it is used by BioMart code internally. The function names are compliant with the W3C DOM specification. This is still active, for example in Internet Explorer 9, but it is not clear how to query tags in an XML file that has been read and parsed.

3.2 Data access methods

Peter discussed the possibility of defining an identifier prefix as an attribute of a DB definition so that it could be prepended to an incomplete identifier in a query (e.g. Ensembl identifiers with missing zeroes added). This was not generally useful as identifiers are usually copied from somewhere else and it is a non-standard way to represent them.

Mahmut has improved DAS feature reading using XPath. Results are output in DASGFF format to test for loss of information. The parser is simple, and does not so far handle multiple notes tags.

Michael suggested possibly converting DAS parsing outputs into Ensembl objects using the location as a reference.

The parser fails to define a feature ID which is needed for output.

Michael is dumping sequences for now from Ensembl, but wants to use the Ensembl mapping functions to also retrieve feature information (repeats, genetic variation, transcripts and exons). The ensslice objects can be used to create EMBL features.

Ensembl feature types and tags will need to be mapped to the EMBOSS internals. Peter can make a list of feature types and tags into appropriate Efeatures.ensembl and Etags.ensembl configuration files.

3.3 EDAM

None. Jon is away this week.

3.4 Data types

Peter has reorganised the source code for each datatype into a set of files. In ajax/core there are:

object definitions ajwxyzdata.h
general datatype object handling code ajwxyz.c/h
input and output ajwxyzread.c/hajwxyzwrite.c/h

In ajax/ajaxdb there are:

datatype-specific access methods in ajwxyzdb.c/h

and data access can also use any text-based access method in ajax/ajaxdb/ajtextdb.c if the database has an appropriate datatype-specific format defined.

4. Administration

4.1 Advisory Board

The first EMBOSS Scientific Advisory Board meeting is scheduled for Friday 5th November.

4.2 Interim report

BBSRC require an interim report on progress in the first part of the funding period. Peter will make a first draft.

5.0 Documentation and training

None.

6. User queries and answers

All done.

7. AOB

None.

8. Date Of Next Meeting

The next EMBOSS meeting will be on Monday 18th October. Peter and Mahmut will be away.