EMBOSS: Project Meeting (Mon 13th December 10) |
Minutes of the meeting of 22nd November 2010 are here.
Mahmut is looking into using BAM format and indexes for assemblies, which will involve code to manipulate CIGAR strings.
Mahmut is checking code in embpat which may be redundant.
>b>Peter has committed an emboss.standard file which will contain definitions of variables, databases, servers and resources for all EMBOSS installations. The initial version includes definitions for the EDAM and DRCAT databases, and a server plus cache file for major databases on the EBI SRS server.
New applications have been committed to query EDAM by name space, and to make general queries of ontology data.
New applications have been committed to query EDAM terms and to then look up matching terms in the DRCAT data catalogue.
New applications have been committed to look up EDAM terms and to then query ACD files for matching EDAM relations attributes. The ACD parsing in embgroup has been updated to search for relations.
Mahmut is working on access to ebeye web services. These are still changing. EBI External Services do not have a clear need for fully-featured web services with metadata. There will soon be a release with the domain hierarchy as separate queries, and better XML with discrete queries.
For DAS features, Mahmut has extended the dassources test application to query feature data. The parsing has been changed from libexpat to the DOM (ajdom) parser with new functions added by Alan. The DOM parser always reads from a file, so HTTP query results are saved to a file. The sequence USA function has been extended to support features. When output is rewritten as DASGFF format XML, the results have some differences from the original DAS data.
Peter noted that the new "S4" search services from EBI will
have DAS access, but are still under development.
Mahmut is also looking into CHADO databases FLYBASE
(also in ensembl) and
GENEDB (also in ensembl genomes). Michael noted the
databases are collaborating so it should be possible to get data via
Ensembl. In CHADO, features may have sequence data and need major
reprocessing efforts.
Several new users came forward and offered to contribute to EDAM
development, some in other areas related to bioinformatics.
Jon hopes to add some machine-readable definitions of where
EDAM coverage ends, e.g. in EDAM operations go as far as feature
detection in sequences, but finer-grain detail could be annotated
using the Sequence Ontology (SO).
Some potential users wanted a micro-ontology but there is no desire to
maintain one. EDAM is happy to include specific terms if no other
ontology defines them and they are well maintained. As a result, EDAM
may grow or shrink in certain specialised areas.
There was a predoc student talk on using the EDAM annotation of EMBOSS
and other tools for the automation of workflow composition, with a
good prototype that reads sequence data and produces an
alignment. There is a clear need to distinguish core input and output
types and other more general parameters.
The next EMBOSS meeting will be on Monday 10th January.
3.4 EDAM
Jon attended a semantic web conference in Berlin and gave a
workshop on EDAM and the EMBRACE technology recommendations, and on
BioXSD with Matus Kalas. There were up to 50 attendees from broad
backgrounds including computer science and biology.
4. Administration
None.
5. Documentation and Training
5.1 Books
6. User queries and answers
All done.
7. AOB
8. Date Of Next Meeting
Team members will be on vacation in December. January 3rd is a public
holiday.