|
EMBOSS: Project Meeting (Mon 11th October 10)
|
Attendees
EBI:
Peter Rice,
Mahmut Uludag,
Michael Schuster
Visitors:
Apologies:
Alan Bleasby,
Jon Ison,
1. Minutes of the last meeting
Minutes of the meeting of 4th October 2010 are
here.
2. Maintenance etc.
2.1 Applications
None.
2.2 Libraries
2.3 Other
The server.srs database definitions for the EBI SRS server now include
the various UniRef databases clustering UniProt protein sequences by
percent identity.
3. New developments
3.1 Axis2C
Mahmut has been testing Axis2C as an alternative way to
implement the SOAP protocol in EMBOSS. There were problems in
compiling the original source release on the EMBOSS machines, but it
worked on other EBI machines.
With the binary release of Axis2C there were problems in linking to EMBOSS.
Axis2C is very large compared to gsoap which only needed one
source file and one header file in EMBOSS, but we may be able to use
an existing Axis2C installation.
Axis stub generation can be done at install time if we use stubs with dependencies n axis2java.
Mahmut is waiting for a reply from the author of gsoap.
Mahmut also looked at the ajdom source code. There is
limited documentation as it has not been used by EMBOSS release code
although it is used by BioMart code internally. The function names are
compliant with the W3C DOM specification. This is still active, for
example in Internet Explorer 9, but it is not clear how to query tags
in an XML file that has been read and parsed.
3.2 Data access methods
Peter discussed the possibility of defining an identifier
prefix as an attribute of a DB definition so that it could be
prepended to an incomplete identifier in a query (e.g. Ensembl
identifiers with missing zeroes added). This was not generally useful
as identifiers are usually copied from somewhere else and it is a
non-standard way to represent them.
Mahmut has improved DAS feature reading using XPath. Results
are output in DASGFF format to test for loss of information. The
parser is simple, and does not so far handle multiple notes tags.
Michael suggested possibly converting DAS parsing outputs into
Ensembl objects using the location as a reference.
The parser fails to define a feature ID which is needed for output.
Michael is dumping sequences for now from Ensembl, but wants to
use the Ensembl mapping functions to also retrieve feature information
(repeats, genetic variation, transcripts and exons). The ensslice objects
can be used to create EMBL features.
Ensembl feature types and tags will need to be mapped to the EMBOSS
internals. Peter can make a list of feature types and tags into
appropriate Efeatures.ensembl and Etags.ensembl configuration files.
3.3 EDAM
None. Jon is away this week.
3.4 Data types
Peter has reorganised the source code for each datatype into a set of
files. In ajax/core there are:
- object definitions ajwxyzdata.h
- general datatype object handling code ajwxyz.c/h
- input and output ajwxyzread.c/hajwxyzwrite.c/h
In ajax/ajaxdb there are:
- datatype-specific access methods in ajwxyzdb.c/h
and data access can also use any text-based access method in
ajax/ajaxdb/ajtextdb.c if the database has an appropriate
datatype-specific format defined.
4. Administration
4.1 Advisory Board
The first EMBOSS Scientific Advisory Board meeting is scheduled for
Friday 5th November.
4.2 Interim report
BBSRC require an interim report on progress in the first part of the
funding period. Peter will make a first draft.
5.0 Documentation and training
None.
6. User queries and answers
All done.
7. AOB
None.
8. Date Of Next Meeting
The next EMBOSS meeting will be on Monday 18th October. Peter and
Mahmut will be away.