|
EMBOSS: Project Meeting (Tuesday 16th August 2005)
|
Attendees
EBI:
Peter Rice,
Jon Ison,
Alan Bleasby,
Lion:
Sanger:
Tim Carver
Visitors:
Apologies:
Lisa Mullan
1. Minutes of the last meeting
Minutes of the meeting of 2nd August 2005 are
here
2. Software Development
2.1 Commandline Logging
Peter has added functions to ACD to log the command line and to
build additional commandline qualifiers to define the user's
non-default responses to prompts. These are intended to be used in
logging the commandine options used for database indexing with the dbi
and dbx programs.
The commandline logs can also be reported in other output files that
can accept commented headers, including alignment and report files.
There was discussion of general logging, for example theough the ajExit
call. This may be needed for workflow provenance, using the same
calls. it is not clear whether a commandline option should turn this
on, or whether an environment variable is the best way.
2.2 Prettyplot update
Peter has updated prettyplot to fix some long standing
bugs/features. The postscript graphics had a blank extra page at the
end which Jemboss was able to ignore. This is now fixed by changing
the order of newpage functions. The example output had no boxes drawn
- because the example data file had such low sequence weights that the
plurality figure was too high for even perfect identities to be
boxed. The amino acid residue colours assumed alphabetical order for
the amino acids in the comparison matrix, which is not the cae for
most of the matrix files. This was corrected to check the residue
numbers in the matrix. Nucleic acids are coloured by ABI base code.
These was discussion of adding options (also to abiview) to control
the residue colouring in some standard way.
One remaining issue if a failure to close boxes at the top and bottom
when an alignment has to be split over multiple pages. Currently
cutting and pasting the plot together will look right so this is left
to be fixed at a future date.
2.3 Database indexing
Alan has fixed a bug in uniprot indexing, and tested full runs
on uniprot and embl. Two users who reported the problem have also
tested the fix successfully.
When indexing is extended to deleting entries some code changes will
be required to maintain a well balanced tree structure.
Retrieval speeds for the new indices are similar to those for dbi
(emblcd format) indices. Indexing speeds are slower, but the
potential for updatable indices will lead to faster indeexing times
overall for users.
2.4 Vienna RNA package
Alan has converted two programs from the Vienna RNA package as
a new EMBASSY package. The niputs were interactive in the original
programs and require entering an RNA sequence and a string of
constraints with the same length as the sequence. For the initial
conversion these will be read from an optional second input
file.
Peter will look into adding a markup file to be
associated with sequence input - it could for example contain protein
secondary structure assignments for other programs and be a generally
useful feature. Some sequence formats (the Vienna package has its own
format) may be able to read both sequence and markup (Swissprot could
contain secondary structure assignment for proteins). Other formats
could read a second input file, as we do already for feature
information from GFF files.
2.5 Postscript output files
The Vienna RNA and Phylip packages both produce postscript files using
their own graphics routines. These are a problem for Jemboss which is
lacking a Postscript viewer.
Alan and Peter will look into the ease of converting
both packages to use the planned new EMBOSS graphics library. As an
interim measure, Jemboss may be able to call a conversion utility
(which would need to be separately installed) so that a erver or a
standalone Jemboss could convert postscript output to PNG, JPEG or
GIF.
3. Administration
3.1 EMBOSS Meeting Schedule
Rodrigo (EBI External Services) still cannot make the new
meeting date/time. Peter will look for an alternative date/time that
suits everyone. Tuesdays at 11am is a possibility.
3.2 Compilation output
Alan proposed modifying the compilation of EMBOSS to give a
shorter report for each source file, making warning and error message
more obvious (especially for deprecated function calls). This would
require installation of at least one GNU tool (autoconf to create a
local config.h file). This approach may be (very) useful for developers, but
probably not for the release unless a workaround is possible.
3.3 Documentation of proposed work
Jon proposed a general policy of documenting new work in
advance, especially as a guide to new developers. This was agreed as
appropriate.
Peter offered to review the previous such documentats, for
example the C coding standards, and bring them up to date.
4. Documentation & Training
4.1 Source Code Documentation
Jon presented a list of proposed extensions to the source code
documentation.
Function arguments will be separately documented as "inputs",
"outputs" and "input/outputs". "Inputs" will include read-only,
function and vararg arguments. "Outputs" will include write and delete
arguments. "Input/Outputs" will be the remaining updated arguments.
For each function, Jon will look to create a list of all error
messages generated by the function itself. These can be used to
generate a comprehensive list of all error messages, to generate test
cases for each message, and to generate a list of all messages that a
function can produce when called. The latter is complicated by the
need to check messages produces by functions called within the
function, so that higher-level fnuctions can potentially have a very
large number of messages.
Each function should have one or more simple coding examples (several
functions could be combined in one example). Peter and
Jon will look for ways to gererate such test cases, initially
for the string functions, using the existing source code validation
script.
Function names should be consistent. For example there should be a
prefix for each datatype, and standard naming for constructors (New)
etc. with standard extensions to the name for specific
variants. Alan proposed retaining the old names as macros with
a compiler directive to flag them as deprecated whenever used. They
could be removed completely in a future release, perhaps in 4.0.0.
Function arguments should have consistent naming. Peter
proposed a list of recommended or deprecated names (for example names
such as "end" that are hard to accurately detect in documentation, or
names such as "n" that are meaningless. These would be applied to the
global functions in each source file.
4.2 SourceForge Website
Peter has removed obsolete links (mainly to RFCGR/HGMP) from
the emboss.sourceforge.net website. Tim has updated the Jembsos
pages, removing references to the RFCGR server, and updating other
sections.
Links to CCP11 will need to be removed as the onlly CCP11 pages
currently on the web are the obsolete ones at Daresbury. CCP11
references can remain but be unlinked.
4.3 FAQ for Developers
Jon offered to add an FAQ list for new develoeprs to guide them
through the first stages of writing EMBOSS code and
applications. Peter offered the workshop examples from recent
EBI courses, and proposed ocmbining this with planning for the
programmers' book.
A similar FAQ could be prepared for system administrators.
5. User queries and answers
Jon reported on 4 outstanding user queries from the past 2
weeks. All have been resolved.
6. AOB
6.1 Hardware
Alan reported that the loan from IBM is due for renewal. A new
machine is not yet available so the present machine may continue fo ra
few months.
Apple have provided the new OSX release for
Alan and Tim.
Peter has contacts from ISMB offering time and/or loan
machines from two other companies. One has been in contact, the other
needs to be followed up.
7. Date Of Next Meeting
Next meeting at 9.30 on Tuesday 30th August 2005.