EMBOSS: Project Meeting (Mon 17th Oct 11)
1. Minutes of the last meeting
The minutes of the meeting on 10th October 2011 are
2. Maintenance etc.
Jon will review the five domain EMBASSY packages to identify
applications which can be moved into the main EMBOSS package, and to
mark others used only by students at HGMP as obsolete.
Peter has reduced redundancy in AJAX and NUCLEUS header file
includes. All references to ajax.h or emboss.h in the
library source files are replaced by the set of include files required
for that module.
A new include file ajlib.h provides ajdefine, ajarch, ajmem,
ajmess, ajfmt and ajstr as these were used almost universally.
All header files have been tested in stand-alone compilation to make
sure they resolve all definitions independently.
Peter has fixed the ajListDrop function, as suggested last week.
Michael is relocating preprocessor defines from ajdefine
to the domain header files as enums to improve type checking.
Peter has updated the embossdoc.pl script to parse the
new listNew function prototype. A kludge was needed for
the assert function which is in parentheses to avoid a conflict
with the assert macro. It is not clear whether this function is
required for compilation.
3. New developments
Mahmut has added 9 test cases for BAM index output testing.
Code now uses AjPTable objects as hash tables,
and quicksort in place of the samtools ksort.
Code will be checked in when functions and datatypes are renamed to
EMBOSS standards. Local datatypes can keep their samtools names to
simplify merges with samtools code updates. Some structures are only
defined by name in the public header file, with the detailed structure
restricted to the library source file. Any documentation conflicts can
be resolved later for these. Eclipse makes it easy to compare recent
samtools code with the latest repository.
Index code will be in a
new ajbamindex source file, with general functions added to the
existing ajseqbam sources.
3.2 EDAM and DRCAT
3.3 Remote input
Peter proposes to add a test for all input queries to use http
and ftp URLs as input. An example would be reference sequence data for
assembly reading which is often stored as a URL in a BAM file. These
need to be a text-based format for automatic detection to work. In
general, reference sequences are in FASTA format. Samtools includes
code which can be used as a model for FTP access.
Mahmut noted that samtools looks for remote files to match a
BAM index, so it would be useful to preserve the URL of an open file
for use in generating a URL for a related file.
3.4 New output formats
Peter has added bedgraph and wig (wiggle) as new
xygraph output formats where the xygraph has a sequence:
attribute defined in ACD. The x coordinates are the sequence base
positions. The y coordinates are the data value, scaled to 0 to 1000
for a wiggle output. The sequence name is needed in bedgraph format.
Jon has completed revision of the 'topic' branch with a clean
separation of tool-centric and data-centric terms. The next task is a
bottom-up cleanup of the 'operations' branch to check for a consistent
level of detail.
Users, especially in Bergen, have suggested a number of definition
updates and proposed merges of related terms.
Updates of the format and identifier branches are complete.
Michael and Peter discussed the pros and cons of
providing EMBOSS on github. Unfortunately there is already an EMBOSS
project as part of biolib, and an emboss user as part of
BioPerl. Maintaining the existing CVS repository may be the simplest
5. Documentation and Training
6. User queries and answers
8. Date Of Next Meeting
The next EMBOSS meeting will be on Monday 24th October.