EMBOSS: Project Meeting (Mon 17th Oct 11)


Attendees

EBI: Peter Rice, Jon Ison, Mahmut Uludag, Michael Schuster
Visitors:
Apologies: Alan Bleasby,

1. Minutes of the last meeting

The minutes of the meeting on 10th October 2011 are here.

2. Maintenance etc.

2.1 Applications

Jon will review the five domain EMBASSY packages to identify applications which can be moved into the main EMBOSS package, and to mark others used only by students at HGMP as obsolete.

2.2 Libraries

Peter has reduced redundancy in AJAX and NUCLEUS header file includes. All references to ajax.h or emboss.h in the library source files are replaced by the set of include files required for that module.

A new include file ajlib.h provides ajdefine, ajarch, ajmem, ajmess, ajfmt and ajstr as these were used almost universally.

All header files have been tested in stand-alone compilation to make sure they resolve all definitions independently.

Peter has fixed the ajListDrop function, as suggested last week.

Michael is relocating preprocessor defines from ajdefine to the domain header files as enums to improve type checking.

2.3. Other

Peter has updated the embossdoc.pl script to parse the new listNew function prototype. A kludge was needed for the assert function which is in parentheses to avoid a conflict with the assert macro. It is not clear whether this function is required for compilation.

3. New developments

3.1 Assemblies

Mahmut has added 9 test cases for BAM index output testing. Code now uses AjPTable objects as hash tables, and quicksort in place of the samtools ksort.

Code will be checked in when functions and datatypes are renamed to EMBOSS standards. Local datatypes can keep their samtools names to simplify merges with samtools code updates. Some structures are only defined by name in the public header file, with the detailed structure restricted to the library source file. Any documentation conflicts can be resolved later for these. Eclipse makes it easy to compare recent samtools code with the latest repository.

Index code will be in a new ajbamindex source file, with general functions added to the existing ajseqbam sources.

3.2 EDAM and DRCAT

3.3 Remote input

Peter proposes to add a test for all input queries to use http and ftp URLs as input. An example would be reference sequence data for assembly reading which is often stored as a URL in a BAM file. These need to be a text-based format for automatic detection to work. In general, reference sequences are in FASTA format. Samtools includes code which can be used as a model for FTP access.

Mahmut noted that samtools looks for remote files to match a BAM index, so it would be useful to preserve the URL of an open file for use in generating a URL for a related file.

3.4 New output formats

Peter has added bedgraph and wig (wiggle) as new xygraph output formats where the xygraph has a sequence: attribute defined in ACD. The x coordinates are the sequence base positions. The y coordinates are the data value, scaled to 0 to 1000 for a wiggle output. The sequence name is needed in bedgraph format.

3.5 EDAM

Jon has completed revision of the 'topic' branch with a clean separation of tool-centric and data-centric terms. The next task is a bottom-up cleanup of the 'operations' branch to check for a consistent level of detail.

Users, especially in Bergen, have suggested a number of definition updates and proposed merges of related terms.

Updates of the format and identifier branches are complete.

4. Administration

Michael and Peter discussed the pros and cons of providing EMBOSS on github. Unfortunately there is already an EMBOSS project as part of biolib, and an emboss user as part of BioPerl. Maintaining the existing CVS repository may be the simplest option.

5. Documentation and Training

None.

6. User queries and answers

All done.

7. AOB

None.

8. Date Of Next Meeting

The next EMBOSS meeting will be on Monday 24th October.