EMBOSS: Project Meeting (Sep 20th 1999)


Attendees

Sanger Centre: Peter Rice, Ian Longden, Richard Bruskiewich, Alessandro Guffanti
HGMP: Alan Bleasby, Gary Williams, Mark Faller, Sinead O'Leary
EBI:

1. Matters Arising

2. General progress on release 0.0.4

Some applications can generate multiple output files that users may want to combine. For example, "fuzznuc" hits. Alan reported that multiple patterns in a single run would be inefficient. For complex post processing (e.g. distance constraints between patterns) it was considered best to make separate runs with single patterns and then analyse the combined results. The current plan is to write GFF format feature output from each run and to combine these files. An example problem area is the prediction of Matrix Attachment Regions (MARs) in human genomic DNA.

For GFF output, some examples are needed of how ACEDB currently expects to see the tag value fields, together with a list of the agreed vocabulary in the other fields.

Peter has started working on reading sequences from blast 1.4 and blast 2.0 indexed databases, with optional FASTA format files, NCBI index files and if needed EMBOSS index files for rapid ID and accession lookup.

Peter has permission from Bill Pearson to use the code in the FASTA source libraries for reading GCG sequence database .seq files. These will be added to the EMBOSS sequence database access methods. GCG index files will not be used because they are both proprietary and slow. Staden/EMBO-CD style index files will be used instead, with SRS as an alternative indexing method.

Gary is now working on the "showseq" application, which could grow rapidly.

Peter has updated all documentation for library functions in the source code, and this now appears on the HTML pages and un the EFUNC and EDATA SRS databases. There are some source files that are omitted from the index pages for NUCLEUS and AJAX. Peter will update these today.

Many applications still need documentation.

Gary proposed moving the "data" directory up one level. This was agreed, but may need some care if users are updating the current location already.

Alan has looked into linking EMBOSS with the Bioinformatics Template Library (BTL) from Birkbeck College London. BTL is in C++, and EMBOSS fails to compile with a C++ compiler because by emulating C++ we have used some reserved words. The most important are "this" and "bool" which will need to be renamed. Suggestions are welcome!

Hans Ullitz-Moeller (EMBnet Denmark) would like PNG output. PLPLOT may be able to produce PNG in the near future, but we do not have the source code for this yet.

Application "geecee" is no longer needed, and can be dropped from the makefiles.

3. Beta Release

Richard, Peter and Ian attempted a Windows NT build with Simon Kelley. PLPLOT was built but there were problems linking it because EMBOSS looked for the X11 libraries. Work on this will continue.

David Mathog at Caltech is also interested in Windows NT and VMS builds. He will be working on these.

To make Unix installation simpler, and to help with these ports, Ian is writing a true configure script and Makefile.am for PLPLOT. The PLPLOT library will be stored and distributed unpacked so that we can work on changes where needed, and pass them back to the PLPLOT maintainer(s).

Ian recommends a clean checkout with CVS -P to clean up old directories that are no longer needed.

4. Bruges Workshop

Peter has a draft schedule for Thursday's workshop which will be updated in the next few days. About 30 attendees are registered.

5. Any Other Business

Peter is planning to submit a paper on EMBOSS, and has been approached to write a short description of EMBOSS for one journal.

Articles for the CCP11 Newsletter and embnet.news are also planned.

6. Next meeting

Next meeting Monday 27th September, 11:00am, usual place.


Peter Rice, Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton Hall, Cambridge, CB10 1SA, UK.