|
EMBOSS: Project Meeting (Sep 28th 1999)
|
Attendees
Sanger Centre:
Peter Rice,
Ian Longden, Alessandro Guffanti
HGMP:
Alan Bleasby, Gary Williams, Mark Faller, Sinead O'Leary
EBI:
1. Matters Arising
The meeting was postponed for 1 day because of other commitments.
2. General progress on release 0.0.4
Problems with "this" and "bool" as reserved C++ tokens, which C++
compilers object to in EMBOSS code, can be avoided by renaming them to
"ajthis" and "ajbool". This will be done in the near future.
David Mathog at Caltech has AJAX working with DEC C. The warnings in
ajfmt.c are fixed, but the Henry Spencer library is broken by the
DEC C proposed changes. David is now working on NUCLEUS code.
Peter and Ian are looking into a possible editor (MSE, from Will
Gilbert at UNH) for EMBOSS.
3. Beta Release
The EMBnet node in Israel (INN at the Weizmann Institute) have provided
a report on their experiences with the beta release. Some of the issues
(program names, long qualifier names) were already addressed during the
EMBOSS workshop.
- Program names do not always imply what they do but there
are utilites like wossname to find the program you need, and program
groups for them to use.
- Qualifier names are long and cumbersome but can be shortened
as long as the remain unambigious. The "-help" output needs to be
changed to show the shortest possible names.
- Short descriptions would be useful They are present, unless
"-auto" is used.
- The input format is not clear, e.g. one or more sequences
We need to review prompts for sequences, sets and streams.
- Information on how to specify graphics devices is needed
This is still under review, but does need to be cleared up.
- Programs do not always give a choice (or even state the default)
of various parameters These should be "optional" and the defaults
should appear where possible in the "-help" output.
- Output files do not always have a summary of the parameters
They should. These will be added.
- Some programs show hits in reverse order starting at the end of
the sequence This will be fixed. When output is based on features,
they ca be automatically sorted.
- programs which produce no hits should not produce just an
empty output file Will be fixed.
- DNA alignment programs need to recognize ambiguity codes So
we need a more complete default comparison matrix than the NCBI BLAST
one.
4. EMBnet Workshop
The Workshop discussion document is available
on the EMBOSS web pages.
A summary of additional points raised is also available. These included:
- Input formats could include RSF, ABI trace files, SCF 3 files and
previous versions, and MASE editor format.
- USAs should include URLs. This maty need a little preprocessing to ensure
"http:" is correctly recognised.
- Gap characters need to be correctly changed before output, for
example PHYLIP format should use "-" as a gap character. Internally,
"-" was agreed as the most useful gap character and input should be
converted to use this.
- Among sequence database formats, there was considerable support
for GCG .ref and .seq files, with emblcd/staden indexing, and for
blast2 databases.
- Incremental updating of databases would be useful but there was no
clear agreement on how best to achieve it.
- Blast scores in GFF feature tables should be in the tag-value fields as
more than one score is useful.
- A text output format for features is needed. There was no
preference expressed on the format.
- EMBL/Swissprot feature table keys and qualifiers are a suitable
controlled vocabulary for use in EMBOSS.
- An interesting suggestion was to provide defaults in a project
file, either for all applications (in an emboss.default or .embossrc
file) or application specific (in a defaults.program file for
example). The syntax could be:
OPTION program.qualifier "value"
- A specific filename extension for output would be useful for post processing.
- NCBI's Vibrant interface was suggested as a possible GUI and graphics engine.
- There is the possibility of integrating a simple editor from the public domain.
- There were offers to help with documentation, and discussion of a
formal "EMBOSS Documentation" project.
- Documentation should include training material with EMBOSS
applications as examples. Where EMBOSS has no application to cover a
particular example, one should be developed for completeness.
- For Linux systems, an RPM distribution would be very useful.
- When the new installation procedure is ready, it will be announced
together with a request for beta testers to fill in a simple
registration form to find out what platforms are most used.
- It would be useful to have a database query for local data files
(for example CUTG in SRS for codon usage).
5. Any Other Business
6. Next meeting
Next meeting Monday 4th October, 11:00am, usual place.
Peter Rice, Informatics Division, The Sanger Centre,
Wellcome Trust Genome Campus, Hinxton Hall, Cambridge, CB10 1SA, UK.