EMBOSS: Project Meeting (Mon 22nd March 10)


Attendees

EBI: Peter Rice, Jon Ison, Mahmut Uludag
Visitors:
Apologies: Alan Bleasby

1. Minutes of the last meeting

Minutes of the meeting of 15th March 2010 are here.

2. Maintenance etc.

2.1 Applications

Alan has added casts/etc to satisfy the WIN32 compiler (e.g. needle, needleall).

2.2 Libraries

Peter will fix the pepstats bug where one of the data file functions does not search all the possible directories.

Peter fixed a problem in loading very large (Human chromosome 1) FASTA sequences. The sequence format auto-detection required the input to be buffered, which was done with a record size (reserved size for each string) of over 2000 bytes. Several fixes were applied. The buffered records use the minimum string size (ajStrAssignS no longer copies the original reserved size). More significantly, as FASTA format cannot fail once it has accepted the ID record, buffering is now turned off for FASTA format and for any other format once there is no "return ajFalse" in the ajseqread.c function that parses it. Run times improved dramatically as a result. Alan has rewritten the HttpGet functions/sub-functions in ajseqdb to make them IPv6-compliant. This also involved additions to ajsys for socket handling. The HttpGet functions now contain no ifdefs. Committed to CVS.

Alan spent some time looking at SIGALRM equivalents for WIN32. Preliminary investigation shows that some Microsoft example code snippets do not appear to work quite as advertised using the Express compiler.

2.3 Other

Mahmut is working on the SoapLab "read timed out" error.

3. New developments

3.1 BioMart access

Alan reported on discussions with Syed with respect to BioMart metadata.

A new application martseqs queries a registry and marks those marts/datasets/attributes that can return sequence information. The library has been extended to cope. Code is committed.

Alan wrote HTTP URL routines to parse URLs. This is done in an IPv6-compliant way following W3 recommendations. They are currently in ajmart.c but can be moved to (e.g.) ajstr.c at some stage (or equivalents added). Code committed.

3.2 Interfaces

Peter plans to investigate automation of EMBOSS adaptors for GALAXY as soon as the books are completed. He will go to the GALAXY developers meeting in May to present the results.

3.3 EDAM

Jon reported on discussions with the Sequence Ontology group at EBI who have a new web browser for SO terms. Jon will test this when a new prototype is available. This has potential as a way to propose new terms and gives a good graphical view of the structure and terms of an ontology. with the ability to save terms in various formats.

The Swiss Institute of Bioinformatics are reviewing EDAM and will report back to Jon

3.4 Future plans

Peter plans to implement BAM sequence format before the next release.

Peter would like to implement a parser and indexing for OBO format ontology data (EDAM, GO, SO, and others) and for the NCBI taxonomy so that these can be used to enhance EMBOSS results.

Peter will look into modifying the AjPFeature structure so that child features are stored in a list within the parent. The current approach of including all in one table is problematic when sorting and when processing results. The code changes should be minimal and the results would be much cleaner. At the same time, GFF3 format needs some attention to enforce stricter rules including the escaping of characters in tag values.

4. Administration

The new Subversion server is not yet tested.

5. Documentation and Training

5.1 Books

Jon reported on the current status. The User Manual format examples are complete.

Peter has worked through the latest 'to do' records for the Developers Manual. Most of the 'new' items were repeats of tasks already done and committed.

Jon reported the Developers Guide is currently around 450 pages. The ACD syntax documentation could be too much for the book size and may need trimming to fit (with the full document on the web site).

We are waiting for news from the publishers on the printed size of the books. The page count estimates are for the Word version.

5.2 Training

Alan Set up a Jemboss server for Lisa to use to develop material for her session in the EBI Plant Bioinformatics training course at the end of this month

6. User queries and answers

All outstanding queries put on the Sourceforge tracker.

7. AOB

Peter attended a next generation sequencing meeting in Cambridge and had some discussions on Bio-Linux with Bela Tiwari.

8. Date Of Next Meeting

The next meeting will be on Monday 29th March.