EMBOSS: Project Meeting (Mon 7th March 11)
Minutes of the meeting of 14th February 2011 are here. There were no meetings in the past 2 weeks as Peter was on vacation.
Mahmut will investigate the bigwig and bigbed formats for large-scale data used by genome browsers.
In discussions, Peter proposed extending the database and server configuration parser and internals to allow multiple definitions of certain fields. In addition to the above fields, a new 'field' attribute would be very useful to define individual query fields, superseding the current 'fields:' attribute which simply lists available fields. The 'field:' attribute could include alternative field names, common in SRS and possibly used by other access methods.
As each attribute will define a single item, there is then scope to add a documentation string after some delimiter which could be ignored internally but used to document the taxon name, EDAM term name, or field description.
Mahmut has updated the DAS cachefile generation to include the taxon identifier of the coordinate system as a 'taxon:' attribute.
Jon has added a regular expression to define the syntax of identifier terms. This can go very deep, for example species-specific identifiers in Ensembl. EMBOSS needs a source of these, but perhaps it is too detailed for EDAM. The regular expressions were taken from MIRIAM. Further databases have been added to DRCAT, and their identifier terms added to EDAM.
Peter suggested formally defining the regular expressions as "perl-compatible" which fits the library used by EMBOSS and avoids confusion over the various regular expression standards.
Jon reported that that Matus Kalas has added many new EDAM format terms, covering XML formats in detail.
Peter will review the EDAM format terms and add them to the EMBOSS internal lists of input and output formats.
Jon reported that EDAM terms have been cleaned to refer to one core type (e.g. sequence, sequence alignment and sequence features) with one 'official' standard format.
Jon and Peter discussed the possibility of splitting EDAM identifier terms into a separate name space. This can wait for a future EDAM update.
Jon outlined recent developments in the BioNemus WSDL editor which may use EDAM, but which needs additional syntax information.
Peter has added 'Taxon' lines to DRCAT. Many were easy to do, the only ones in need of checking are those for oncology databases which were assumed to be Human but there was no clear indication in the DRCAT entry annotation.
EBI External Services have asked Jon to include PubMed IDs in the DRCAT entry. These can be easily added at least for data resources in the NAR special issues.
There have also been recent problems with availability of the EMBOSS FTP server.