EMBOSS: Project Meeting (Mon 14th June 10)


Attendees

EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag
Visitors:
Apologies:

1. Minutes of the last meeting

Minutes of the meeting of 7th June 2010 are here.

2. Maintenance etc.

2.1 Applications

Mahmut has modified needle to allow successive gaps in both strands. Complexities in the alignment code have been rewritten and extensively tested. Output from needle can now be in SAM alignment format.

2.2 Libraries

Peter is adding "bam" sequence format. It is working as an input format, and highlighted a bug in resetting buffered files which was masked by making the other binary input format ("abi") the last to be tested. "Bam" output format will be completed this week.

BAM format is a binary input format and raises several issues. A test is needed for standard input, using a new function ajFileIsFile which will need modification to work under Windows. Bam format tests the last 28 bytes of the file so there is also an allowance for a file offset of -28 from the end to fail for an empty or short file.

Alan has corrected the BAM code to compile on Windows. There were issues with inline functions in headers and the naming of off_t which should be replaced by sys_t.

Peter has updated the Sequence Ontology terms in feature table internals to match the latest 2.4.3 release of SO and SOFA. The variation term previously used is obsolete, and is replaced by one which has a suitabel name although it is currently labelled as polypeptide-specific.

Mahmut is testing SAM and BAM formats using picard (a Java version of samtools whish supports extensive unit tests for the detailed support of these formats. Initial tests use fastq data converted to SAM and BAM files.

Mahmut has also tested alignments in SAM files, for example as supermatcher output. New CIGAR string handling functions were required.

SRS retrieval should check the number of entries and work with a limited chunk size, which must be small to handle very large assembly entries. This will need at least one extra call for each USA. Peter will nmake the code changes.

2.3 Other

Alan is considering adding configure options to enable the use of installed copies of zlib and expat on the local system. Additional variables will be needed for each library to be tested.

3. New developments

3.1 BioMart

Alan has checked the Mart code in ajseqdb.c and committed changes. Memory leaks in other Mart functions have been fixed.

Alan has looked into column ordering in Mart server results processed by ajMartCheckHeader. Function ajMartSetHeader can be used before the query to specify that a column header is required, returning the long name in the header. ajMartCheckHeader does an attributes query and matches the column names, returning a NULL_terminated string array. This is only needed once per query. It can be activated to check that the first value in the array matches the expected sequence attribute.

BioMart uses RESTful URL access, recommended by Syed Haider. Michael Schulster's Ensembl API code uses SQL access. BioMart can also be used to access Ensembl data.

3.2 EDAM ontology

Jon has included ELIXIR database survey results in dvxref.dat. Queries need to be annotated. The priority has been to cover databases cross-referenced by EMBL and UniProt.

Data providers have been contacted by email to comment on the query records.

Jon has committed ACD files with corrected relations attributes.

Jon has discussed SAWSDL with the BioCatalogue team at EBI. Discussion will continue on the BioCatalogue mailing lists. There is a need for common tools to browse and edit EDAM and other ontologies. A wiki page has been set up to collect recommendations. Some content on the EDAM pages has been moved to the Wiki.

EDAM user and developer mailing lists have been created.

4. Administration

Alan and Mahmut need to check Jemboss behaves correctly and note examples to be used as tests for future releases. Mahmut notes that picard uses TestNJ (and extended JUnit test framework) for normal java non-GUI testing.

We are waiting for a quote from the systems group.

5. Documentation and Training

5.1 Books

Jon will work on these again after the relesae.

6. User queries and answers

There is one outstanding question about BioPerl.

Mahmut is looking into a report of vectorstrip not trimmming with -allsequences set.

These was a report of pepwindow crashing under the Mobyle interface.

7. AOB

Peter attended theEMBnet/EMBRACE next generation sequencing workshop in Bari last week.Bastien Chevreux presented the latest MIRA 3.x release and provided much helpful detail for the changes needed in the EMIRA ACD files.

Mahmut noted that MIRA uses SSAHA to trim adaptor sequences. It is now possible to use supermatcher as an alternative, generating the same output format.

Peter noted that July 15th will be the 10th anniversary of the EMBOSS 1.0.0 release.

8. Date Of Next Meeting

The next meeting will be on Monday 21st June.