EMBOSS: Project Meeting (Feb 28th 2000)


Sanger Centre: Peter Rice, Ian Longden, Richard Bruskiewich
HGMP: Alan Bleasby, Gary Williams, Val Curwen
Apologies: Mark Faller

1. Matters Arising

2. General progress on release 0.0.4

GNU sort seems to have different default behaviour, but as all sorts do seem to accept the "-T" option Peter will set a default for database indexing of "-T . -k 1,1" which should work on all systems and can be changed by the user if needed.

system calls still need to be changed, and any rm calls need to be "unlink" instead.

Peter has changed output to stdout and stderr by a few applications to use ajUser, ajFmtprintF to the output file, and in a few cases ajDebug.

HGMP databases need to be added to the documentation pages, and could be added to the example emboss.default template file.

The -acdpretty option should stop the program.

Gary would like an application to extract the full text of a database entry. Peter will modify sequence reading to store all input as text and echo it to a file in a new application "seqdoc", but with no guarantee of correctness. For example, a GCG version of an EMBL entry will have spaces in the feature table's ".." locations.

Peter has a solution to floating point rounding problems in alignment algorithms for large sequences.

Logging is now enables at HGMP. Many users are running infoseq.

A random filename for temporary files would be useful. There is a simple one already in emma.

3. Graphics

Peter has a modified pepinfo that uses a standard xygraph object, and has modified the ajgraph and PLPLOT libraries so that all PLPLOT specific code will be in PLPLOT as new functions with the prefix "plx".

Some further testing is needed before committing these changes.

An alternative font for alphanumeric characters in graphics output would be useful.

4. Interfaces

PISE is installed at HGMP, as a test site for the distribution.

The email return of results needs to change. Results are stored in temporary disk space for up to 5 days.

Some database entry USAs are allowed, for example SWISSPROT:entry_name in needle, but wildcards are not allowed. One option is to add new wildcards to the USA syntax for use in PISE.

PISE now has a startup script (optionally) to setup the user environment for EMBOSS.

5. Features

Ian and Peter have looked at Matt Pococks embltogff conversion. It had problems with multiple accession numbers, and used the accession number as the sequence ID which will not be the EMBOSS choice. It also ignored all feature qualifiers, though this may change.

For feature locations that are not simple start and end positions, new tag values to hold the location as text would be useful.

It was proposed to first implement an EMBOSS interpretation of GFF output, and then to post this for discussion to the GFF mailing list.

Gary is writing showfeat to display features. There are still some problems with missing feature tables.

6. Any Other Business

The Trends in Genetics article has been refereed. They would like a more general introduction and less technical detail. Peter will draft revisions.

7. Next meeting

Next meeting will be Monday 6th March 2000, 11:00am, usual place.