![]() |
EMBOSS: Project Meeting (Mon 13th December 10) |
Minutes of the meeting of 22nd November 2010 are here.
Mahmut is looking into using BAM format and indexes for assemblies, which will involve code to manipulate CIGAR strings.
Mahmut is checking code in embpat which may be redundant.
>b>Peter has committed an emboss.standard file which will contain definitions of variables, databases, servers and resources for all EMBOSS installations. The initial version includes definitions for the EDAM and DRCAT databases, and a server plus cache file for major databases on the EBI SRS server.
New applications have been committed to query EDAM by name space, and to make general queries of ontology data.
New applications have been committed to query EDAM terms and to then look up matching terms in the DRCAT data catalogue.
New applications have been committed to look up EDAM terms and to then query ACD files for matching EDAM relations attributes. The ACD parsing in embgroup has been updated to search for relations.
Mahmut is working on access to ebeye web services. These are still changing. EBI External Services do not have a clear need for fully-featured web services with metadata. There will soon be a release with the domain hierarchy as separate queries, and better XML with discrete queries.
For DAS features, Mahmut has extended the dassources test application to query feature data. The parsing has been changed from libexpat to the DOM (ajdom) parser with new functions added by Alan. The DOM parser always reads from a file, so HTTP query results are saved to a file. The sequence USA function has been extended to support features. When output is rewritten as DASGFF format XML, the results have some differences from the original DAS data.
Peter noted that the new "S4" search services from EBI will have DAS access, but are still under development.
Mahmut is also looking into CHADO databases FLYBASE (also in ensembl) and GENEDB (also in ensembl genomes). Michael noted the databases are collaborating so it should be possible to get data via Ensembl. In CHADO, features may have sequence data and need major reprocessing efforts.
Several new users came forward and offered to contribute to EDAM development, some in other areas related to bioinformatics.
Jon hopes to add some machine-readable definitions of where EDAM coverage ends, e.g. in EDAM operations go as far as feature detection in sequences, but finer-grain detail could be annotated using the Sequence Ontology (SO).
Some potential users wanted a micro-ontology but there is no desire to maintain one. EDAM is happy to include specific terms if no other ontology defines them and they are well maintained. As a result, EDAM may grow or shrink in certain specialised areas.
There was a predoc student talk on using the EDAM annotation of EMBOSS and other tools for the automation of workflow composition, with a good prototype that reads sequence data and produces an alignment. There is a clear need to distinguish core input and output types and other more general parameters.
The next EMBOSS meeting will be on Monday 10th January.