EMBOSS: Project Meeting (Mon 10th August 09)


EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag

1. Minutes of the last meeting

Minutes of the meeting of 3rd August 2009 are here.

2. Maintenance etc.

2.1 Applications

The Debian folk have pointed out that the PHYLIPNEW package accidentally includes a COPYING file which describes GPL. This needs to be replaced by a decsription of the original PHYLIP licence conditions for code and documentation. The other EMBASSY packages need to be checked.

2.2 Libraries

Peter reported discussions with the other Open-Bio projects on FASTQ format. An outstanding issue is whether to allow zero length sequences after trimming of adaptors and low base qualities. As FASTQ files may be paired (where the data originally was for paired-end reads) it will be necessary to keep such zero length sequences. This will require a change to the current requirement for at least 1 base in a sequence. Applications will need to define a minimum sequence length which may default to zero or 1 - whichever caused the least disruption to existing ACD files. Peter reported from the GMOD meeting. In discussions with the samtools author Heng Li it was agreed that EMBOSS would concentrate on interconversion of alignment formats to/from SAM and BAM formats. This is a much requested extension to samtools that EMBOSS can most appropriately cover.

Peter described the SAM tab delimited format and the BAM binary version. BAM is increasingly used to store unaligned short read data as the compression is very efficient and BAM files can be very efficiently indexed and accessed using samtools remote protocols (ftp and http).

2.3 SoapLab

Mahmut reported on a problem with SoapLab services. A workaround has been found. A fix will be applied this week.

The job timeout is not working when jobs are submitted through LSF. Terminating a job when the timeout is reached returns a "completed" job status instead of "terminated". Mahmut will fix this before extending the run time.

Mahmut has updated the Taverna pluging for SoapLab to support Taverna 1.7.2. The version update is necessary because Taverna's use of Raven requires plugin versions to match the Taverna release number.

Mahmut will also modify the SoapLab plugin to support Taverna 2.0. Discussions with Taverna support have resolved an issue with logging error messages (logging only works in the lib directory). The plugin API is similar to the Taverna 1.7.2 interface.

3. New developments

Peter described how new data access methods could read data from the Genetic Model Organism databases (GMOD) which all use the Chado schema and are typically stored in PostGres. Like Ensembl, Chado databases have an API and a standard schema that we can access across many resources.

Peter proposed an extension to the DB definitions in emboss.default to define a SERVER with an access method that could return all databases and their formats. Applications could query any server to find the database names and formats that it can access.

The USA sytnax will require an extension to access a database from a known server. Peter proposed a syntax of "dbname@server:" as the database prefix. The emboss.default file would only need a definition of the server.

4. Administration

Peter confirmed that the workstations are now on order from Dell. It has taken 7 weeks to get the order processed. Delivery time should be in the next 2 weeks.

Peter suggested creating shared directories on either the current RAID server or the proposed new server. Alan pointed out that the new server may be accessed externally so that the internal RAID server is the best place. Directories will be created for databases and for software. The software directory can be included in users' paths once packages have been installed.

5. Documentation and Training

5.1 Books

Alan has continued working through the Developer's Guide. A careful check is needed through the text to confirm that obsolete function names have been replaced by the latest names. Comments with an 'ajb:' prefix have been added where changes are definitely required.

Peter has run spell checking on the Word versions and updated all except the last part of the Developer's manual. All changes have been committed so the spell check run will be completed this afternoon. Peter has the accepted words in the CUSTOM.DIC Word dictionary file on his laptop.

Jon has created a new file IndexTerms.txt with proposed indexable terms and their synonyms. These need to be reviewed and any unwanted terms deleted before the list is supplied to the publishers.

Jon has checked sections titles, corrected capitalisation and cleaned up cases where the chapter and first section title were identical. Some sections have been reordered and some paragraph tags are replaced by section tags. "NB" in the text is replaced by note tags. When reviewing the text additional blocks can be created for "caution", "important", "note" and "tip". These will be highlighted in the final book format. Some informal tables have been replaced with formally defined table blocks.

All files need to be checked for consistent use of the 6.1.0 version number.

Jon has removed the NUCLEUS chapter from the Developer's Guide as there is a lack of detailed material.

Jon has generated complete User and Developer Reference Manuals using the new master files.

Peter proposed a "fridge magnet" motif for the book covers, using EMBOSS application names with appropriate juxtapositions (perhaps "seqret syco"). This could be used together with any graphics design ideas.

5.2 Website

Peter updated the developer EFUNC and EDATA and the release EFUNCREL and EFUNCDATA databases in SRS.

6. User queries and answers

Peter has discussed phylip formats with Joe Felsenstein, the author of PHYLIP. We can invent a new format, but will have to take care that all the EMBOSS phylip code can handle long names.

EMBOSS will be able to interpret any phylip file by retrying using each possible format in turn. The first line of the file defines the number of sequences and their length so that any misinterpretation of names will lead to invalid results. The phylip formats try each other in turn on input.

7. AOB

Peter attended the GMOD meeting in Oxford last week. Details of dicsussions are given in earlier items.

8. Date Of Next Meeting

The next meeting will be on Monday 17th August.