EMBOSS: Project Meeting (Tuesday 16th August 2005)


EBI: Peter Rice, Jon Ison, Alan Bleasby,
Sanger: Tim Carver

Apologies: Lisa Mullan

1. Minutes of the last meeting

Minutes of the meeting of 2nd August 2005 are here

2. Software Development

2.1 Commandline Logging

Peter has added functions to ACD to log the command line and to build additional commandline qualifiers to define the user's non-default responses to prompts. These are intended to be used in logging the commandine options used for database indexing with the dbi and dbx programs.

The commandline logs can also be reported in other output files that can accept commented headers, including alignment and report files.

There was discussion of general logging, for example theough the ajExit call. This may be needed for workflow provenance, using the same calls. it is not clear whether a commandline option should turn this on, or whether an environment variable is the best way.

2.2 Prettyplot update

Peter has updated prettyplot to fix some long standing bugs/features. The postscript graphics had a blank extra page at the end which Jemboss was able to ignore. This is now fixed by changing the order of newpage functions. The example output had no boxes drawn - because the example data file had such low sequence weights that the plurality figure was too high for even perfect identities to be boxed. The amino acid residue colours assumed alphabetical order for the amino acids in the comparison matrix, which is not the cae for most of the matrix files. This was corrected to check the residue numbers in the matrix. Nucleic acids are coloured by ABI base code.

These was discussion of adding options (also to abiview) to control the residue colouring in some standard way.

One remaining issue if a failure to close boxes at the top and bottom when an alignment has to be split over multiple pages. Currently cutting and pasting the plot together will look right so this is left to be fixed at a future date.

2.3 Database indexing

Alan has fixed a bug in uniprot indexing, and tested full runs on uniprot and embl. Two users who reported the problem have also tested the fix successfully.

When indexing is extended to deleting entries some code changes will be required to maintain a well balanced tree structure.

Retrieval speeds for the new indices are similar to those for dbi (emblcd format) indices. Indexing speeds are slower, but the potential for updatable indices will lead to faster indeexing times overall for users.

2.4 Vienna RNA package

Alan has converted two programs from the Vienna RNA package as a new EMBASSY package. The niputs were interactive in the original programs and require entering an RNA sequence and a string of constraints with the same length as the sequence. For the initial conversion these will be read from an optional second input file.

Peter will look into adding a markup file to be associated with sequence input - it could for example contain protein secondary structure assignments for other programs and be a generally useful feature. Some sequence formats (the Vienna package has its own format) may be able to read both sequence and markup (Swissprot could contain secondary structure assignment for proteins). Other formats could read a second input file, as we do already for feature information from GFF files.

2.5 Postscript output files

The Vienna RNA and Phylip packages both produce postscript files using their own graphics routines. These are a problem for Jemboss which is lacking a Postscript viewer. Alan and Peter will look into the ease of converting both packages to use the planned new EMBOSS graphics library. As an interim measure, Jemboss may be able to call a conversion utility (which would need to be separately installed) so that a erver or a standalone Jemboss could convert postscript output to PNG, JPEG or GIF.

3. Administration

3.1 EMBOSS Meeting Schedule

Rodrigo (EBI External Services) still cannot make the new meeting date/time. Peter will look for an alternative date/time that suits everyone. Tuesdays at 11am is a possibility.

3.2 Compilation output

Alan proposed modifying the compilation of EMBOSS to give a shorter report for each source file, making warning and error message more obvious (especially for deprecated function calls). This would require installation of at least one GNU tool (autoconf to create a local config.h file). This approach may be (very) useful for developers, but probably not for the release unless a workaround is possible.

3.3 Documentation of proposed work

Jon proposed a general policy of documenting new work in advance, especially as a guide to new developers. This was agreed as appropriate. Peter offered to review the previous such documentats, for example the C coding standards, and bring them up to date.

4. Documentation & Training

4.1 Source Code Documentation

Jon presented a list of proposed extensions to the source code documentation.

Function arguments will be separately documented as "inputs", "outputs" and "input/outputs". "Inputs" will include read-only, function and vararg arguments. "Outputs" will include write and delete arguments. "Input/Outputs" will be the remaining updated arguments.

For each function, Jon will look to create a list of all error messages generated by the function itself. These can be used to generate a comprehensive list of all error messages, to generate test cases for each message, and to generate a list of all messages that a function can produce when called. The latter is complicated by the need to check messages produces by functions called within the function, so that higher-level fnuctions can potentially have a very large number of messages.

Each function should have one or more simple coding examples (several functions could be combined in one example). Peter and Jon will look for ways to gererate such test cases, initially for the string functions, using the existing source code validation script.

Function names should be consistent. For example there should be a prefix for each datatype, and standard naming for constructors (New) etc. with standard extensions to the name for specific variants. Alan proposed retaining the old names as macros with a compiler directive to flag them as deprecated whenever used. They could be removed completely in a future release, perhaps in 4.0.0.

Function arguments should have consistent naming. Peter proposed a list of recommended or deprecated names (for example names such as "end" that are hard to accurately detect in documentation, or names such as "n" that are meaningless. These would be applied to the global functions in each source file.

4.2 SourceForge Website

Peter has removed obsolete links (mainly to RFCGR/HGMP) from the emboss.sourceforge.net website. Tim has updated the Jembsos pages, removing references to the RFCGR server, and updating other sections.

Links to CCP11 will need to be removed as the onlly CCP11 pages currently on the web are the obsolete ones at Daresbury. CCP11 references can remain but be unlinked.

4.3 FAQ for Developers

Jon offered to add an FAQ list for new develoeprs to guide them through the first stages of writing EMBOSS code and applications. Peter offered the workshop examples from recent EBI courses, and proposed ocmbining this with planning for the programmers' book.

A similar FAQ could be prepared for system administrators.

5. User queries and answers

Jon reported on 4 outstanding user queries from the past 2 weeks. All have been resolved.

6. AOB

6.1 Hardware

Alan reported that the loan from IBM is due for renewal. A new machine is not yet available so the present machine may continue fo ra few months.

Apple have provided the new OSX release for Alan and Tim.

Peter has contacts from ISMB offering time and/or loan machines from two other companies. One has been in contact, the other needs to be followed up.

7. Date Of Next Meeting

Next meeting at 9.30 on Tuesday 30th August 2005.