EMBOSS: Project Meeting (Mon 7th Nov 11)


Attendees

EBI: Peter Rice, Jon Ison, Mahmut Uludag, Michael Schuster
Visitors:
Apologies: Alan Bleasby,

1. Minutes of the last meeting

The meetings on 24th and 31st October were cancelled as Peter was on vacation.

The minutes of the meeting on 17th October 2011 are here.

2. Maintenance etc.

2.1 Applications

Peter has updated pepwheel following suggestions from David Mathog to improve the handling of leucine zipper helices where the residues overlap every 7 residues and rapidly reach the borders of the plot.

Further changes have been suggested and will be considered for implementation before the release. They require changes to the command line interface which are best done with the other changes planned for 6.5.0 for prettyplot.

2.2 Libraries

Peter updated the handling of ambiguity codes by fuzznuc so that ambiguity codes in the input sequence(s) can be matched. Ambiguity codes are included in their own expansion. Escaping the code with a backslash prevents expansion so that, for example, '\S' will match only an S in the input sequences. Michael has recoded Ensembl object adaptors to have one central function to avoid redundant code. Features are mapped to slices.

Michael noted that a strndup function call in the latest BAM code is specific to the GNU C library. It compiles on other systems but fails at run time, for example on MacOSX using a BSD-style C library.

Michael noted that the include statements need to start with the ajdefine.h file to set up memcheck validation which is specified in the config.h file. Peter will update the source files.

Michael will soon commit changes to redefine domain headers for EMBASSY-related utilities as enumerated types.

Michael noted some EMBASSY packages generate many warnings when the configure files are synchronized with the main EMBOSS configure and the devwarnings options are used. These include shadowing variable names in emnu and type issues with strlen. Peterproposed committing the revised configure files and removing the more serious warnings. The configures could be changed to turn off these warnings later if they are considered harmless.

2.3. Other

3. New developments

3.1 Assemblies

Mahmut is improving BAM and SAM format support, preserving header tags from alignment content. The updated code has been tested on various public example files.

3.2 EDAM and DRCAT

Jon has cleaned up the operations branch in EDAM and will provide a copy for validation and updating of EMBOSS ACD files. Some cleanup in the data branch is still required.

3.3 Remote input

Peter has implemented HTTP and FTP URLs as valid queries for any data type. The input string has to be treated as the whole URL. We can implement new qualifiers like -iformat to specify the query and the offset (we need an alternative to the %offset syntax on Windows in any case) but it is not easy to find a suitable name for these qualifiers.

3.4 Variation data

Peter has implemented variationget to read and write VCF variation data files in 4.0 and 4.1 formats. The code needs further testing and will be committed in a few days.

Mahmut has looked in the the binary BCF format which uses a BAM index for VCF and GFF files with indexing of intervals. The BCF code in samtools is not complex and can be used as a model for this format.

3.5 EDAM

4. Administration

All systems recovered after the weekend shutdown.

5. Documentation and Training

Jon still needs a final review of the new website, and we need to seek permission to use the new book covers as a logo.

Peter will request EBI E-Learning accounts to set up test courses. The primary contact will be Jon.

6. User queries and answers

All done.

7. AOB

None.

8. Date Of Next Meeting

The next EMBOSS meeting will be on Monday 7th November.