EMBOSS: Project Meeting (Mon 7th September 09)


Attendees

EBI: Peter Rice, Jon Ison, Alan Bleasby, Mahmut Uludag
Visitors:
Apologies:

1. Minutes of the last meeting

Minutes of the meeting of 24th August 2009 are here.

2. Maintenance etc.

2.1 Applications

Mahmut is looking at end weighted gaps for needle. it was agreed that a -endweight option is needed together with separate -endopen and -endextend options.

2.2 Libraries

Peter has updated the warning and error messages for fastq-sanger format to include examples of bad data provided by Peter Cock of BioPython. Some of the examples had incomplete final entries. This should be a general error message in EMBOSS but to date has been ignored. All sequence formats count the number of lines processed. When a known format has processed at least one line without success this is now an automatic error. Note that some pathological cases can still pass, for example a FASTA header record (or partial record) can be valid if the sequence length is allowed to be zero.

Peter has added new file output functions to write strings and characters without first copying to an internal buffer. This saves a significant proportion of the time needed to write a large short read sequence file. The functions that write newlines need careful testing on other systems as we write binary files and need to manage the newline formats ourselves. Alan requested that any system-specific functions needed should go into ajsys.c

3. New developments

Jon has added relations: tags for all application blocks in ACD files. There is also a table of the appropriate EDAM types for input and output data types, with dependencies on other attributes (sequence type, number of sequences in an alignment, etc.) which can be used to direct checks by acdvalid.

About 50 new EDAM terms were needed, e.g. for missing functions to match the existing EDAM fields.

Some further cleanup of EDAM was completed at the same time.

Jon has added 80 potential applications to the wiki to cover EDAM functions that were not used in annotating the existing ACD files.

Jon has committed the ACD files for applications that will handle lists of databases (finddb, idtell, seqxrefall, showdball).

Michael Schuster has contributed an updated code library for SQL access to Ensembl servers. Alan will review the code.

Peter noted that there is a BioRuby interface to BioMart that can be used as a template for access of BioMart resources by EMBOSS.

Peter has a description of Roche SFF next generation sequencing data file formats from Peter Cock. These will be added to the Wiki.

Peter is looking into the Open Bio project OBDA which standardizes the way projects access indexed flatfiles, remote servers and BioSQL data resources. It may be interesting for BiOSQL. Details will be added to the Wiki.

Mahmut is looking into new applications for trimming next generation sequence data. Possibilities include adding pattern matching methods to vectorstrip.

Peter will add wiki notes on fast string matching methods for genome-scale string searches that were described at ISMB in Stockholm. The searches were extremely fast - a few seconds for substring matches.

4. Administration

Peter is still waiting for workstation delivery news. Our sales contact assures us that they will be delivered before the new date of 21st September - having failed to arrive by the original date of 28th August.

Alan and Peter will have a meeting with the EBI systems group later today to discuss server configurations and pricing.

When the new systems arrive, the old workstations will remain in service running Windows XP. Alan will request new static addresses so that we have the option of installing the new workstations under new names. Two of the old mimservers are dead so their numbers could be reassigned.

Alan reported that the one of backup drives is showing problems and the drive on emboss5 should be replaced. Peter will order a replacement, and other sundry items (3 SATA cables and 2 replacement optical mice).

Alan will contact OBF through their system tickets to restart the copying of CVS commits to the public server.

5. Documentation and Training

5.1 Books

Guy has sent an update to his wEMBOSS chapter.

Peter has decided to refactor all function names in the ajgraph source file because the original naming (from before release 1.0.0) does not make sense in the context of the books. Functions with similar ajGraph names may handle AjPGraph objects, handle plplot data objects, or simply change or use the plplot internals.

5.2 Website

None.

6. User queries and answers

Alan reported that in mEMBOSS the output of prettyplot is not correctly centred in the boxes.

7. AOB

Peter will be at a BBSRC meeting this week.

Jon and Peter will go to the EMBRACE ontologies meeting in Amsterdam in mid-September.

Peter will be at a large data meeting in Beijing in October.

8. Date Of Next Meeting

The next meeting will be on Monday 14th September.