EMBOSS: Project Meeting (Mon 14th February 11)


Attendees

EBI: Peter Rice, Mahmut Uludag, Jon Ison, Michael Schuster
Visitors:
Apologies: Alan Bleasby,

1. Minutes of the last meeting

Minutes of the meeting of 7th February 2011 are here.

2. Maintenance etc.

2.1 Applications

Peter has fixed a bug in showdb which was reporting taxonomy databases twice. The code has been rewritten to generally iterate through the database types.

Peter will look into a request for applications to allow a -bothstrands option to automatically process both strands of an input sequence. This will require changes to the application logic on a per-program basis.

Peter will look into handling small word sized in diffseq.

2.2 Libraries

Peter proposed extending code for handling AjPTable objects to automatically merge tables with common key and value structures. This can be made very efficient by first resizing the tables to have the same hash array size. Table merging will greatly simplify the code to handle the new query language operations.

Peter will look into a user request from EBI External Services to support the fastm sequence format variant which stores sets of short protein sequence fragments.

Peter will look into extending SAM and BAM formats to support features.

2.3 Jemboss

Mahmut will look into a user query on string values from ACD in Jemboss should be converted to numeric data. It is likely that the current handling is correct.

2.4 Other

Peter is collecting fixes for a patch release.

3. New developments

3.1 EMBOSS configuration

Peter and Mahmut will attend a DAS workshop next month and give a presentation on the DAS client implementation in EMBOSS.

3.2 Ensembl access

Michael reported that the ensembl registry code now has datatypes for core, variation, etc. and needs adaptors defined for each datatype.

Database adaptors use internal object caches, with stable identifiers and several aliases for each database.

To select the correct organism it would help to have a set of patterns for the organism-specific Ensembl identifiers. The aim is to allow 'ensembl:id' to automatically detect a suitable database to match the ID.

3.4 EDAM

Jon will attend an EDAM/BioXSD workshop in Amsterdam with Matus. New format terms have been added. The workshop will consider adding regular expressions for values associated with data identifier terms.

3.5 DRCAT

Peter has added 'Taxon' records for all entries giving the NCBI taxid and name for the most general taxon covered by the data resource. General resources are classified as '1 all'

Peter has renamed the 'tax-nam' field to 'tax-tax' to reuse the field name most popular for SRS servers to describe the taxon name. The index now covers the scientific name, genbank name and common name.

4. Administration

Peter noted the brief report from the EMBL SAC review of EBI services.

5. Documentation and Training

5.1 Books

Jon has sent in the corrections to the Developer's Guide, and updated the XML source files for this and the Administrator's Guide.

Peter will send the User's Guide corrections today.

5.1 Website

Peter noted recent spam posted to the EMBOSS wiki, and asked for help in monitoring recent changes and removing any further spam.

6. User queries and answers

Jon noted a user query on mapping circular features in cirdna where the labelling is hard to see. Peter will investigate.

7. AOB

Peter reported on the recent DebianMed package developers meeting in Germany.

Peter will go the the meeting of a new COST consortium on next-generation sequence analysis in March.

8. Date Of Next Meeting

The next EMBOSS meeting will be on Monday 21st February. Peter will be away.