EMBOSS: Project Meeting (Mar 29th 1999)

Attendees

Sanger Centre: Peter Rice, Ian Longden, Richard Bruskiewich
HGMP: Alan Bleasby, Val Curwen, Thon de Boer, Mark Faller, Sinead O'Leary, Gary Williams
EBI:
Apologies: Rodrigo Lopez, Martin Senger

1. Matters Arising

Peter has reviewed the proposed ACD changes and will implement them after his return from the China ICGEB course. They will make processing more efficient and simpler, but it is better not to make the changes and commit them just before going away because ACD processing is basically fine.

Gary asked about cleaning up the ACD files. This can be done later for consistency. Most of the things that need to change are workarounds which are superceded by ACD improvements. The workarounds should still be fine.

Peter will implement the latest additions to the ACD graph processing. Ian has produced a list of all current graph options.

Alan continues to look into the CVS problems. The "-k" switch proposed last week made no difference. HGMP will try to set up a test machine for their client.

Peter will beta test SRS6 and check for compatibility with EMBOSS both in using getz for database retrieval and in generating Icarus code for application definitions from ACD files.

2. Library documentation

Library documentation is in two parts, the comments in the source code which are used for the HTML library manual and the SRS EFUNC and EDATA databases, and Peter's general introduction with examples.

Peter proposed rearranging the source code files so that these are better orgnaized and easier to maintain. The header files have "@data" sections for each data type, listing functions by category. These can be used to generate "@section" and "@subsection" divisions in the source files, with the functions sources in order under each subsection.

The same organization can be used in the general introduction.

Some method of including a function in more than one section was suggested. Peter proposed an "@seealso" section to list them.

Alan suggested adding an index by function name. This can be added to each HTML page (one per source file) and to a general index page.

Peter requested high priority sections for the introduction. Lists and Tables were proposed by several people.

3. General progress on release 0.0.4

Alan reported that "make-all-static" is failing on his Linux machine.

Alan has added 5 new applications:

CPGREPORT:: CpG islands
CPGPLOT:: CpG islands
PEPWHEEL:: Helical axis representation of proteins
PEPNET:: Helical net representation of proteins
PEPCOIL:: Coiled-coil prediction for proteins

Gary is working on "compseq" for sequence composition.

Sinead is working on conversion of prosite to a regular expression format.

4. Requirements for release 1.0

A target date for Release 1.0 was discussed. July was proposed by Peter as a provisional date, as the ISMB meeting is in August.

Requirements before release 1.0 include:

Library documentation
Application documentation
ACD changes
A general user guide
An Administrator's guide, including database definitions
Some additional sequence formats for input and output
Additional database formats, which have not been thoroughly tested yet.
A few key applications

Peter will make a list on a Web page for review.

5. GFF and Features

Richard asked about applications that could write GFF files for feature results. He could create a GFF object and GFF output quickly. GFF input may take longer.

A number of programs identify sequence features, for example "pepcoil" and "antigenic".

Other applications read sequence ranges, for example "translate" and "cutseq".

Peter suggested an automatic conversion. If sequence reading could read and write GFF or EMBL/Swissprot feature tables then seqret would automatically convert if the input and output formats supported features.

6. Any other business

6.1 WWW interafces

Peter listed the current options for WWW interfaces:

AppLab (Martin Senger, EBI) will use an XML meta data format.
SRS6 will use Icarus application definitions so that applications can be launched on the results of a query.
www2gcg (Marc Colet, Belgian EMBnet node) uses GCG config files and generates meta HTML which is converted to JavaScript.
SeqPup (Don Gilbert) is a Java interface that uses a modified form of GCG config files.
The China EMBnet node has a student working on WWW interfaces. Peter will visit them next week.

Peter hopes that all of these can be generated automatically from the ACD files.

AppLab was considered the most important of these.

7. Next meeting

Next Monday is a holiday, so next meeting is on the following Monday 12th April, usual time and place.

Peter Rice, Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton Hall, Cambridge, CB10 1SA, UK.