EMBOSS: Project Meeting (Mar 8th 1999)


Attendees

Sanger Centre: Peter Rice, Ian Longden, Richard Bruskiewich
HGMP: Alan Bleasby, Val Curwen, Thon de Boer, Mark Faller, Sinead O'Leary, Gary Williams
EBI: Rob Andrews Martin Senger
Apologies: Rodrigo Lopez, Ewan Birney

1. Matters Arising

Rob Andrews from the EBI is new to the meeting. Rob works on SRS at the EBI and will be representing Thure and the SRS project.

2. Progress on Release 0.0.4

HGMP have had problems with CVS. Alan will try to find a workaround while Peter works with Sanger systems support to find a solution.

Peter announced the new release on the emboss-dev mailing list last week. So far the only reply has been from Rodrigo who has been having problems installing at the EBI. Martin has just picked up a copy and will check whether he can get the install to work. Peter will add the download address to the main web pages when we can confirm that other sites are able to install without problems.

3. ACD extensions

Peter is still working on the ACD extensions discussed last week.

There was discussion about where EMBOSS data files should be located. The present search path is:

  1. The current directory
  2. Hidden directory .embossdata under the current directory
  3. The user's home directory
  4. Hidden directory .embossdata under the user's home directory
  5. The system's emboss/data directory
There were questions about whether two directories are needed at the current directory and home directory levels, and whether hidden directories were the best approach. It was agreed to leave the present search path for now and wait for user reactions. Gary proposed a utility to do a "which" search through the path and report the data file selected, and the locations of alternatives later in the path.

Peter proposed extending the ACD attributed to include a "documentation" attribute for all data types, so that this could be used in building the command line syntax part of the application documentation. This would be optional at first as no ACD file has "doc:" defined. There are issues of how to document booleans, as they would normally appear on the command line as the opposite of the ACD file default.

4. Documentation

An overview of the libraries is still urgently needed from Peter. Thon will work on the library documentation with Peter so that it can be integrated with his ACD document style.

Alan asked how often library functions were added, and how often the documentation should be printed. Peter proposed comparing the documentation each night in the SRS indexing job, and mailing differences to the emboss-dev list.

Thon has added the ACD expression extensions to the ACD document. He will add the other new features (-options and graph definitions) and make a new version.

Gary has provided documentation for 5 new applications which are included in the Applications web pages. These are cutseq, maskseq, pasteseq, revseq and transeq. The documentation style was agreed, and will be used for other applications.

Peter will try to build the command line syntax for all programs automatically from their ACD definitions (see above).

The application documentation includes a "See also" section of related applications. It was agreed to maintain this by hand for now, but to aim for an automated list. The heading should include one or more application types so that each application can be gived defined classes (e.g. restriction mapping, pattern matching, database searching) and a list of all applications in these classes can be built to replace the current manual list. In all cases the description should be the brief "Function" description from the documentation, so that the list will be similar when rebuilt automatically. The same text should appear as the "documentation:" attribute for the applications in their ACD files.

Gary proposed "r2d2" as an application name to convert RNA to DNA. Peter pointed out that this needs no extra code. A version of seqret with "type: dna" as a sequence attribute will convert the input sequence automatically, though of course it would need to be copied to a new name to be active. The original seqret handles any sequence and must not be given a defined type.

Rodrigo has not yet provided the Icarus code to update documentation over HTTP. Peter will chase this up.

5. New applications

Alan has implemented dan for DNA melting, with a "-plot" option to produce graphical output. The ajGraph functions need much extending to handle the needs of dan, especially for histograms and to the way plotting colours can be changed which clashes with the timing of initialisation and plotting. Alan and Ian will work together on these functions.

Alan has added two new functions, ajSysUnlink (delete a file) and ajSysCanon (set terminal canonical mode).

Alan will next work on helixturnhelix, sigcleave and antigenic, 3 EGCG applications which use local data files.

Gary will work on further simple sequence manipulation applications. Some are already named in the "See also" lists for the ones added last week.

Sinead will work on comparing sequnece motifs to a sequence database, and comparing a sequence to a pattern database such as prosite. Peter has implemented the POSIX regular expression library from Henry Spencer as ajPosreg with the same functions as ajReg. Peter has provided a Sanger perl script to convert prosite.dat pattern lines into regular expressions.

Thon is not working on applications at present, but will be updating the ACD documentation and working with Peter on library documentation.

Val is working on AJAX library functions for sequence output with additional derived information. Peter suggested linking this with AJAX functions to write text output as text, HTML, etc. For sequences, output could also be JavaScript as used by SRS applets.

Mark is working on clustal in two parts. "eclustal" is nearly complete. "clustree" needs graph library extensions.

Mark is also working on a general "pepinfo" application to calculate protein sequence properties. Peter will try to track down 2D gel calculations for isoelectric points which have some diferences from the usual amino acid parameters.

Ian will drop his pattern matching work and concentrate on the ajGraph library with Alan.

Peter will be working on the ACD code and on library documentation.

6. Web Pages

The home page is getting long and has some out of date information. Peter will revise it to have most paragraphcs moved down to a lower level and will check the content for accuracy.

7. Any other Business

Richard is working on restructuring acedb code, and expects to find code which could be useful in EMBOSS.

The acedb team are looking into graphics libraries that are portable to non-Unix systems, for example GTK. Peter expressed interest, especially for map displays which could be used for sequence features, but was also concerned about licensing issues because EMBOSS libraries need to be compatible with the GNU Library license and not tied to the full GPL.

Gary proposed changing "int" to "long" throughout the string library. It was felt that as "int" data types were at least 32 bits on all currently supported platforms this was not necessary. Implementing it would cause problems with any function returning a "long" to an application (for example legacy code) that needs "int".

Rob is working in the SRS team on linking metabolic pathway databases, but will pass on to David Kreil at the EBI a request to look into the conversion possibilities of ACD into SRS 6 application definitions in Icarus, and will check on any changes in SRS 6 that could affect the use of "getz" for database access.

Martin has just released the new version of AppLab and will make a comparison between ACD syntax and the definitions he plans for AppLab which will use XML files as meta definitions of applications.

Peter will work on adding options to "ajcompile" to automatically build XML for AppLab and Icarus for SRS from ACD files.

8. Next meeting

Next meeting Monday 15th March, usual time and place.
Peter Rice, Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton Hall, Cambridge, CB10 1SA, UK.