EMBOSS: Project Meeting (Mon 6th Jun 11)


Attendees

EBI: Peter Rice, Jon Ison, Mahmut Uludag, Michael Schuster
Visitors:
Apologies: Alan Bleasby,

1. Minutes of the last meeting

There was no meeting last week (public holiday).

Minutes of the meeting of 23rd May 2011 are here.

2. Maintenance etc.

2.1 Applications

Peter has added applications to report cross-references for sequence objects.

seqxref reports the cross-references.

seqxrefget adds an extra step to identify the database type and generate a retrieval command using entret, textget, etc.

A possible extension to these applications is to add SO (sequence ontology) references for the feature types.

2.2 Libraries

Peter has added new objects and ACD types for URL and variation.

The URL datatype is to report URLS from DRCAT which do not resolve to readable text when called from within EMBOSS.

The variation datatype if to support reading variation data from Ensembl or perhaps from other file formats. Michael suggested dbSNP entries, though their format is subject to change. Mahmut has made GFF3 output more flexible. Tag name validation is not required so EMBOSS can read any tag and write them on output. Tags for more limited formats (for example, EMBL) can be converted on output but stored with their original names internally.

The extra semi colon at the end of the tag field has been removed. GFF validation utilities objected to the last tag ending with a semi colon.

2.3 mEMBOSS

Peter has extended the version number for non-Windows installations by adding ".0" to the reported version. This is to allow mEMBOSS and EMBOSS to report consistent version numbers in QA testing where mEMBOSS versions end with a build number for the distribution.

Peter noted that the offset syntax filename%1234 is not accepted in mEMBOSS. It is rarely used, but should be supported by an alternative syntax. Suggestions are welcome.

2.4 SoapLab

3. New developments

3.1 Access methods

Mahmut has succeeded in reading features from CHADO for a chromosome region using SQL access and native EMBOSS structures. Access is in 3 stages, via transcript and location.

Peter has revised the query language for lists of identifiers. These lists now need a delimiter, either '|' (OR) or the equivalent ',' to simplify parsing. Spaces made testing for operators tricky while allowing spaces in the query syntax for keywords and other searched (where an underscore is allowed as an equivalent).

Peter has updated emboss index access to store matched by their file number and offset rather than by identifier. Accession number searches were storing the accession number, but searches by 'id' and by secondary fields were storing the primary ID. By storing the AjPBtid as the table key the accession and id queries can be safely combined.

Michael is working on Ensembl 62 updates, especially to variation data.

Michael commented that the applications which generate server cache files need a standard naming and user interface. Peter will make a suggestion at the next meeting.

Ensembl access needs improvement to make use of field names and the new query language operators.

3.2 New applications

Alan has added EMBASSY packages for Clustal Omega (clustalomega) and the beta version of Vienna (Vienna2). Peter has populated these with documentation and QA tests.

Peter has resolved some 300 compiler warnings for Vienna2 to give clean build results.

The list of new applications on the Wiki has been updated.

3.3 EDAM

Jon is completing the defining of relations in EDAM. Missing relations added include has_input and has_output relations for operations, and is_format_of and is_identifier_of relations for data.

OBO relations are transitive - relations are inherited by all descendants or a term.

Final checks are being made using Obo-Edit.

Once committed, Peter will check the relations in all ACD files.

3.4 DRCAT

Jon will need updating for changed or obsolete EDAM terms.

4. Administration

Peter is preparing for the release by adding new QA tests.

mEMBOSS is now included in the standard QA testing script, for both the Visual Studio build and the installed mEMBOSS. embossversion is used to find the install and testing directories

Peter has added new files to the .cvsignore lists.

5. Documentation and Training

None.

6. User queries and answers

All done.

7. AOB

None.

8. Date Of Next Meeting

The next EMBOSS meeting will be on Monday 13th June.