EMBOSS: Project Meeting (Mon 15th February 10)


EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag

1. Minutes of the last meeting

Minutes of the meeting of 25th January 2010 are here.

2. Maintenance etc.

2.1 Applications

Alan suggested reviewing the processing of minimum: and maximum: values for integers and floats in ACD. Where one or both are calculated, there is a possibility that the maximum may become lower than the minimum value. In such cases ACD was treating the maximum as the only allowed value. For a bug report where the maximum was zero, a new attribute has been added trueminimum: to use the minimum value instead.

Ideally, the onus should be on the developer to resolve these issues when the ACD file is first written. Peter proposed that ACD should first check whether the either range value is calculated, and if so require the behaviour to be defined through trueminimum or some attribute to fail if the value is out of range. Peter pointed out that a failure message is hard to derive automatically as calculated values depend on other ACD qualifier values. The proposed solution is to define a message to be issued if the ranges fail.

2.2 Libraries

Alan> has written a token parsing function that allows quotes to wrap multiple tokens. Peter will look into naming conventions, and consider merging the quote handling with the existing token parsing functions.

There is also a need for these functions to process single delimiters, for example in tab-separated values where missing values can be represented by consecutive tabs. These are used in BioMart output, and in SAM/BAM sequence data formats.

2.3 Other

Alan and Mahmut have looked into the issue of spaces in qualifier values (e.g. wossname keywords) which fail in Jemboss as the command line is being automatically split at white space.

Alan has rewritten jembossctl to accept matching quotes around single values. This will address the problem for authorized servers where a username and password are required.

Mahmut has a related fix that rewrites the command line as a string array to address the issue for local unauthorized servers.

Both solutions will be implemented.

3. New developments

3.1 BioMart access

Alan has example applications to pass query parameters, generate query XML and report results from a BioMart query. The application needs a mart host, path and post plus a mart registry host, path and post. Usually the registry and the host addresses are the same. The application builds a RESTful GET query using the dataset name, a list of comma-separated attributes to be returned and a query as an XML string.

To define a BioMart database we can use the URL for the host, port and path. The dataset name can be a dbalias attribute.

Queries can specify the BioMart software version to void future incompatibilities.

The results are usually tab separated values. Some attributes containing sequence data can be returned as FASTA. It is unclear how the header information is formatted.

Queries can be verified, can include tab-delimited column labels, return a count (number of matches), select unique results, and add a time stamp (and a [success] tag at the end).

Filters can include a list of values. Implementing this will require an extension to the EMBOSS USA syntax.

Database definitions will need a way to define the attributes to be returned. Peter will propose an extension to the database definition.

3.2 Database servers

The BioMart code can also be used to create a BioMart server query to identify datasets, filter terms and attributes. Peter will compare the BioMart information with other potential database servers (e.g. SRS, MRS, DAS, ENSEMBL, WsDbFetch, Entrez).

Jon will look into servers needed for the list of cross-references databases and other databases included in EDAM.

3.3 EDAM

Jon has completed the basic structure of concepts and relations. A number of other resources have been examined for data importing and new cross-references. This should complete the initial content. Resources include BioMOBY, SO, ONDEX, EBI services, Nucleic Acids Research's categories of web servers and databases, DAS, BioPAX, GO, MAP, MIRIAM, BioRDF, WhatIf, PDB-ML and PSI-MI.

BioMOBY has a large number of datatypes defined. These can be clustered to a few hundred which were mostly already in EDAM. The remainder were added with cross-references to BioMOBY,

In discussions with other ontology experts at EBI, there are plans to make EDAM compatible with a new ontology covering tools, algorithms, data formats and some types.

The ONDEX project at Rothamsted is interested in expanding EDAM to cover their definitions of relationships between entities.

Two EMBRACE workshops are planned, in April and June , where further discussions will take place.

Jon is looking into packages for ontology management, including protege and one commercial products.

4. Administration

Alan reported on the Open Bio move.

Alan has reminded the systems group about a testbed for database indexing over the network. No reply yet.

Alan has reminded Apple that we would like an extension of the machine loan, and to remind them that we are awaiting more information about the pickup of the machines to be returned. Also no reply yet.

5. Documentation and Training

5.1 Books

Jon has been in contact with the publishers. There is some final revision needed before sending them the text, which will go one book at a time starting with the Administrator's Guide. This needs some updating in the databases chapter. The web pages have an updated Word copy.

Jon has made minor edits to correct invalid XML. In the interfaces and links sections, Pise has been superseded by Mobyle and Galaxy has been added.

Stylesheets are now relatively simple to manage, with various presentational issues resolved.

A new mock homepage is available for review.

6. User queries and answers

None new.

7. AOB


8. Date Of Next Meeting

The next meeting will be on Monday 22nd February.