EMBOSS: Project Meeting (Mon 13th December 10)


Attendees

EBI: Peter Rice, Jon Ison, Mahmut Uludag, Michael Schuster
Visitors:
Apologies: Alan Bleasby,

1. Minutes of the last meeting

Most team members were away for the past 2 weeks so the meetings were cancelled.

Minutes of the meeting of 22nd November 2010 are here.

2. Maintenance etc.

2.1 Applications

Mahmut is looking at an issue in vectorstrip where the algorithm looks for full matches but misses some expected hits which could be found by additionally checking partial results.

2.2 Libraries

Mahmut found that in needle and supermatcher an indel after the first base was not supported by the local alignment algorithm. This is now fixed and the QA tests have been updated.

Mahmut is looking into using BAM format and indexes for assemblies, which will involve code to manipulate CIGAR strings.

Mahmut is checking code in embpat which may be redundant.

2.3 Other

3. New developments

3.1 EMBOSS configuration

Peter discussed cache file handling for database servers. A cachefile can provide definitions of databases for a server. If the cache files is not found, it may be possible to query the server directly to identify the available databases.

>b>Peter has committed an emboss.standard file which will contain definitions of variables, databases, servers and resources for all EMBOSS installations. The initial version includes definitions for the EDAM and DRCAT databases, and a server plus cache file for major databases on the EBI SRS server.

New applications have been committed to query EDAM by name space, and to make general queries of ontology data.

New applications have been committed to query EDAM terms and to then look up matching terms in the DRCAT data catalogue.

New applications have been committed to look up EDAM terms and to then query ACD files for matching EDAM relations attributes. The ACD parsing in embgroup has been updated to search for relations.

3.2 Ensembl access

Michael will continue working on the update from ensembl 59 to 60 over December. He hopes to have the changes in place in time for a January EMBOSS release.

3.3 Other access methods

Mahmut has wsdbfetch access working, with tests for access to lists of databases and to data from UniProt using a dbalias. It is possible to process sequence formats (e.g. swissprot) also as feature formats.

Mahmut is working on access to ebeye web services. These are still changing. EBI External Services do not have a clear need for fully-featured web services with metadata. There will soon be a release with the domain hierarchy as separate queries, and better XML with discrete queries.

For DAS features, Mahmut has extended the dassources test application to query feature data. The parsing has been changed from libexpat to the DOM (ajdom) parser with new functions added by Alan. The DOM parser always reads from a file, so HTTP query results are saved to a file. The sequence USA function has been extended to support features. When output is rewritten as DASGFF format XML, the results have some differences from the original DAS data.

reported that EMBOSS features do not map completely to DASGFF. Peter suggested remapping exons as a list of sub-features so that sorting would be simpler, but according to Mahmut there are some more basic issues involved.

Peter noted that the new "S4" search services from EBI will have DAS access, but are still under development.

Mahmut is also looking into CHADO databases FLYBASE (also in ensembl) and GENEDB (also in ensembl genomes). Michael noted the databases are collaborating so it should be possible to get data via Ensembl. In CHADO, features may have sequence data and need major reprocessing efforts.

3.4 EDAM

Jon attended a semantic web conference in Berlin and gave a workshop on EDAM and the EMBRACE technology recommendations, and on BioXSD with Matus Kalas. There were up to 50 attendees from broad backgrounds including computer science and biology.

Several new users came forward and offered to contribute to EDAM development, some in other areas related to bioinformatics.

Jon hopes to add some machine-readable definitions of where EDAM coverage ends, e.g. in EDAM operations go as far as feature detection in sequences, but finer-grain detail could be annotated using the Sequence Ontology (SO).

Some potential users wanted a micro-ontology but there is no desire to maintain one. EDAM is happy to include specific terms if no other ontology defines them and they are well maintained. As a result, EDAM may grow or shrink in certain specialised areas.

There was a predoc student talk on using the EDAM annotation of EMBOSS and other tools for the automation of workflow composition, with a good prototype that reads sequence data and produces an alignment. There is a clear need to distinguish core input and output types and other more general parameters.

4. Administration

None.

5. Documentation and Training

5.1 Books

6. User queries and answers

All done.

7. AOB

8. Date Of Next Meeting

Team members will be on vacation in December. January 3rd is a public holiday.

The next EMBOSS meeting will be on Monday 10th January.