EMBOSS: Project Meeting (Sep 28th 1999)


Sanger Centre: Peter Rice, Ian Longden, Alessandro Guffanti
HGMP: Alan Bleasby, Gary Williams, Mark Faller, Sinead O'Leary

1. Matters Arising

The meeting was postponed for 1 day because of other commitments.

2. General progress on release 0.0.4

Problems with "this" and "bool" as reserved C++ tokens, which C++ compilers object to in EMBOSS code, can be avoided by renaming them to "ajthis" and "ajbool". This will be done in the near future.

David Mathog at Caltech has AJAX working with DEC C. The warnings in ajfmt.c are fixed, but the Henry Spencer library is broken by the DEC C proposed changes. David is now working on NUCLEUS code.

Peter and Ian are looking into a possible editor (MSE, from Will Gilbert at UNH) for EMBOSS.

3. Beta Release

The EMBnet node in Israel (INN at the Weizmann Institute) have provided a report on their experiences with the beta release. Some of the issues (program names, long qualifier names) were already addressed during the EMBOSS workshop.
  1. Program names do not always imply what they do but there are utilites like wossname to find the program you need, and program groups for them to use.
  2. Qualifier names are long and cumbersome but can be shortened as long as the remain unambigious. The "-help" output needs to be changed to show the shortest possible names.
  3. Short descriptions would be useful They are present, unless "-auto" is used.
  4. The input format is not clear, e.g. one or more sequences We need to review prompts for sequences, sets and streams.
  5. Information on how to specify graphics devices is needed This is still under review, but does need to be cleared up.
  6. Programs do not always give a choice (or even state the default) of various parameters These should be "optional" and the defaults should appear where possible in the "-help" output.
  7. Output files do not always have a summary of the parameters They should. These will be added.
  8. Some programs show hits in reverse order starting at the end of the sequence This will be fixed. When output is based on features, they ca be automatically sorted.
  9. programs which produce no hits should not produce just an empty output file Will be fixed.
  10. DNA alignment programs need to recognize ambiguity codes So we need a more complete default comparison matrix than the NCBI BLAST one.

4. EMBnet Workshop

The Workshop discussion document is available on the EMBOSS web pages.

A summary of additional points raised is also available. These included:

  1. Input formats could include RSF, ABI trace files, SCF 3 files and previous versions, and MASE editor format.
  2. USAs should include URLs. This maty need a little preprocessing to ensure "http:" is correctly recognised.
  3. Gap characters need to be correctly changed before output, for example PHYLIP format should use "-" as a gap character. Internally, "-" was agreed as the most useful gap character and input should be converted to use this.
  4. Among sequence database formats, there was considerable support for GCG .ref and .seq files, with emblcd/staden indexing, and for blast2 databases.
  5. Incremental updating of databases would be useful but there was no clear agreement on how best to achieve it.
  6. Blast scores in GFF feature tables should be in the tag-value fields as more than one score is useful.
  7. A text output format for features is needed. There was no preference expressed on the format.
  8. EMBL/Swissprot feature table keys and qualifiers are a suitable controlled vocabulary for use in EMBOSS.
  9. An interesting suggestion was to provide defaults in a project file, either for all applications (in an emboss.default or .embossrc file) or application specific (in a defaults.program file for example). The syntax could be:
                  OPTION program.qualifier "value"
  10. A specific filename extension for output would be useful for post processing.
  11. NCBI's Vibrant interface was suggested as a possible GUI and graphics engine.
  12. There is the possibility of integrating a simple editor from the public domain.
  13. There were offers to help with documentation, and discussion of a formal "EMBOSS Documentation" project.
  14. Documentation should include training material with EMBOSS applications as examples. Where EMBOSS has no application to cover a particular example, one should be developed for completeness.
  15. For Linux systems, an RPM distribution would be very useful.
  16. When the new installation procedure is ready, it will be announced together with a request for beta testers to fill in a simple registration form to find out what platforms are most used.
  17. It would be useful to have a database query for local data files (for example CUTG in SRS for codon usage).

5. Any Other Business

6. Next meeting

Next meeting Monday 4th October, 11:00am, usual place.

Peter Rice, Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton Hall, Cambridge, CB10 1SA, UK.