EMBOSS: Project Meeting (Mon 7th March 11)


Attendees

EBI: Peter Rice, Mahmut Uludag, Jon Ison, Michael Schuster
Visitors:
Apologies: Alan Bleasby,

1. Minutes of the last meeting

There were no meetings on February 21st or February 28th.

Minutes of the meeting of 14th February 2011 are here. There were no meetings in the past 2 weeks as Peter was on vacation.

2. Maintenance etc.

2.1 Applications

Peter has identified the cause of the bug in einverted which reports low-quality alignments in rare cases at low threshold settings. He hopes to have a fix tested by the end of the week.

2.2 Libraries

Mahmut reported a user request to exploit additional content in SAM/BAM formats.

Mahmut will investigate the bigwig and bigbed formats for large-scale data used by genome browsers.

2.3 Other

Mahmut is working on the SoapLab server environment for porting services to the new London Data Centres. EMBOSS 6.3.1 is deployed on the staging server, using the latest SoapLab version including modifications to exclude debugging output from the log files.

3. New developments

3.1 Database configuration files

Peter has extended the database and server fields by adding 5 new fields to match the definitions in DRCAT. These are:
taxon
The NCBI taxon identifier
edamdat
The EDAM datatype term
edamfmt
The EDAM data format term
edamid
The EDAM identifier term
edamres
The EDAM resource term

In discussions, Peter proposed extending the database and server configuration parser and internals to allow multiple definitions of certain fields. In addition to the above fields, a new 'field' attribute would be very useful to define individual query fields, superseding the current 'fields:' attribute which simply lists available fields. The 'field:' attribute could include alternative field names, common in SRS and possibly used by other access methods.

As each attribute will define a single item, there is then scope to add a documentation string after some delimiter which could be ignored internally but used to document the taxon name, EDAM term name, or field description.

3.2 Access methods

Mahmut has updated the Eb-eye cachefile generation code top distinguish searchable and retrievable fields.

Mahmut has updated the DAS cachefile generation to include the taxon identifier of the coordinate system as a 'taxon:' attribute.

3.3 EDAM

Jon reported on the recent EDAM workshop in Amsterdam. EDAM may be supplemented by terms from ChEBI, GO and SO. Use of EDAM will be easier if the EDAM: prefix is routinely used when referring to EDAM terms.

Jon has added a regular expression to define the syntax of identifier terms. This can go very deep, for example species-specific identifiers in Ensembl. EMBOSS needs a source of these, but perhaps it is too detailed for EDAM. The regular expressions were taken from MIRIAM. Further databases have been added to DRCAT, and their identifier terms added to EDAM.

Peter suggested formally defining the regular expressions as "perl-compatible" which fits the library used by EMBOSS and avoids confusion over the various regular expression standards.

Jon reported that that Matus Kalas has added many new EDAM format terms, covering XML formats in detail.

Peter will review the EDAM format terms and add them to the EMBOSS internal lists of input and output formats.

Jon reported that EDAM terms have been cleaned to refer to one core type (e.g. sequence, sequence alignment and sequence features) with one 'official' standard format.

Jon and Peter discussed the possibility of splitting EDAM identifier terms into a separate name space. This can wait for a future EDAM update.

Jon outlined recent developments in the BioNemus WSDL editor which may use EDAM, but which needs additional syntax information.

3.4 DRCAT

Jon has added missing EDAM annotations to DRCAT. The DRCAT update will be completed next week.

Peter has added 'Taxon' lines to DRCAT. Many were easy to do, the only ones in need of checking are those for oncology databases which were assumed to be Human but there was no clear indication in the DRCAT entry annotation.

EBI External Services have asked Jon to include PubMed IDs in the DRCAT entry. These can be easily added at least for data resources in the NAR special issues.

3.5 Other

At the DAS workshop held last week at EBI, Peter and Mahmut gave a talk on the status of DAS client implementation in EMBOSS as an access method for sequences and features.

4. Administration

Michael would like to update the EMBOSS Wiki. Unfortunately, his unused account was one of those disabled after recent spamming attempts by newly created users.

4.1 Open-Bio

Open-Bio is running out of space for the EMBOSS anonymous CVS server.

There have also been recent problems with availability of the EMBOSS FTP server.

5. Documentation and Training

5.1 Books

Jon will contact CUP to request return of the typesetter's annotations for the developers' guide so that corrections can be included in the source code annotation.

6. User queries and answers

All done.

7. AOB

Jon will go to a SWO (software ontology) workshop in Manchester at the end of March to discuss collaborations between SWO and EDAM.

8. Date Of Next Meeting

The next EMBOSS meeting will be on Monday 14th March.