EMBOSS: Project Meeting (Monday 20th August 2007)

EMBOSS: Project Meeting (Mon 20th August 07)

Attendees

EBI: Peter Rice, Alan Bleasby, Mahmut Uludag,
Sanger:
Visitors:
Apologies: Jon Ison, Martin Senger, Shaun McGlinchey, Rodrigo Lopez, Tim Carver

1. Minutes of the last meeting

Minutes of the meeting of 6th August 2007 are here.

2. Software Development

2.1 Applications

Alan has added jaspextract to add JASPAR transcription factor site data to a local EMBOSS installation, and jaspscan to find and report matches.

Currently jaspextract simply checks a JASPAR distribution is correct and copies the files unchanged.

The jaspscan application implements the search methods available through the JASPAR web pages.

Alan has also added a "make check" application dbxstats to report statistics from dbx indices for users in EBI external services.

Peter is expecting a contribution of a Smith-Waterman database search application.

2.2 Libraries

Peter has reviewed the report output and internals following reports of problems when "-sreverse" is used on the command line to reverse complement the input sequence(s) in dreg and fuzznuc. All reports formats now correctly and consistently report the positions. In some cases the positions were stored incorrectly but this was "fixed" when the report format for those applications was produced. This led to strange problems when alternative report formats were selected. The standard now is that all feature format output must be correct (EMBL, SWISS and GFF) to show consistent internal storage, and that the report must consistently show the start less than the end (as required by GFF). For reverse nucleotide sequences a new "Strand" column has been added and a "-rstrandshow" qualifier provided. By default this is true for nucleotide sequences.

Peter has added PDB files as formats for sequence data. The files have sequence data in two places. The ATOM records are the residues in the structure, and the SEQRES records are the residues in the original sequence. In some cases they may not agree. Sequence format "pdb" reads the ATOM records and is tested for by default. The SEQRES records are read by sequence format "pdbseq" which must be explicitly requested.

Alan and Peter have updated the reading of "CCF" format protein structure files to read both the new and old records so that previous inputs continue to work for the psiphi program.

Mahmut has been looking into XML output options for reports, features and sequences. He has installed BlueJay from source and is looking at compatible XML formats we can use to exchange data.

2.3 SoapLab

Mahmut has fixed SoapLab server issues which appeared when the server restarted.

EMBOSS 5.0.0 services all passed QA testing except for the psiphi problem described above.

Old EMBOSS 2.8.0 services produce different results in a few cases. Peter will investigate. Most are insignificant. A few applications (cusp and emma) have unknown qualifiers and no longer support the exact 2.8.0 commandline.

3. Administration

3.1 Release 5.0.0

Alan is investigating a reported problem with launching primer3 on Windows.

4. Documentation and Training

4.1 Books

Jon, Alan and Peter had a lunch meeting with Cambridge University Press to discuss progress on the three books and to update the main author list.

5. User queries and answers

profit should work only with frequency input. It accepts profiles but is unable to handle gap penalties.

prophet has been run by some users as a profile database search. It would be better to add a new version with a threshold value that only reports "hits" rather than reporting the best match for every input sequence. This will need a new name.