EMBOSS: Project Meeting (Mon 21st June 10)


Attendees

EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag, Michael Schuster
Visitors:
Apologies:

1. Minutes of the last meeting

Minutes of the meeting of 14th June 2010 are here.

2. Maintenance etc.

2.1 Applications

Peter has modified pepstats to support cross-linking and reduced forms of cysteines and updated the data values.

Peter has added the organism (using the taxon cross-reference data) to the output columns in infoseq

Mahmut has fixed and updated vectorstrip.

Mahmut has updated and committed needle with patches on the FTP server.

2.2 Libraries

Alan has modified the ajFileIsfile function to work under Windows.

Peter has added BAM format output. BAM is a binary version of the SAM next-generation sequence format.

SAM input can now be processed without the need for a header.

Peter has modified SRSWWW access to make an initial call to count the number of entries and then to loop over them in chunks retrieving data. This makes SRSWWW access safer, but also twice as slow.

Mahmut noted that in FASTQ input files the quality score size was not set precisely when the object was reused.

Mahmut is using the Picard tool to validate FASTQ files in EMBOSS and SamTools.

SAM format can also be used by alignment tools (e.g. SSAHA). for pairwise alignment.

Mahmut has downloaded Mosaik which has C. elegans assembly files to use in testing data formats.

Alan has added an ajStrToUlong function to avoid casts. The existing ajStrToUint function uses the older strtol function, and should be updated. The C89 strtoul function was missing from AIX 1.0 but should now be part of the ANSI C standard and available on all systems.

Alan noted that the latest libpng only works with the latest libgd release candidate.

2.3 Other

Alan has updated the configure script with a --enable-systemlibs switch to disable zlib and expat, and instead to use the system-installed libraries. The Fedora and Debian bundlers have been notified. The PCRE library can be cleaned up after the release.

Mahmut has removed Makefiles from the Java package directories which cause recursive errors when java is not installed in the CVS developer's code.

Alan has updated the version number to 6.3.0 for the release.

Mahmut has completed wrapping of the remaining types in SoapLab, with help from EBI External Services.

Mahmut noted that "config.h" files can clash, for example between embassy packages and EMBOSS. Some tasks are carried out before the include, and there are possible clashes with PCRE and PLPLOT. Alan would also like to standardise version handling for Windows installations.

Alan has tested the latest CYGWIN 1.7.5 which works but has an experimental version of libtool 2 installed.

3. New developments

3.1 BioMart

Alan has code to split the username and password from URLs, and is testing password handling with a proxy server.

Mahmut noted that the BioMart access returns tab-delimited data which may clash with the SAM data format. Peter will check that they are tested in the best order to avoid conflicts. One solution is to not test "biomart" format automatically.

3.2 Ensembl

Michael has code to connect to Ensembl and fetch annotations and sequences. The MySQL driver uses a username and password to access the data which can be encoded in the database URL.

It is also possible to obtain a list of databases from the server.

Ensembl code is returning sequence objects, not suitable for display as text by entret. Peter will design an interface to allow entret to work with objects.

3.3 Data types

Peter has implemented access methods to OBO and data catalogue data. These will be used after the release.

3.4 EDAM ontology

Jon has updated some ACD file relations attributes.

Jon described BioNemus which is an application which would benefit from adding conservative type information to EDAM, for example describing a sequence as having a primitive string type and aggregations such as a sequence record "has_a" sequence. Cardinality could also be added where applicable.

3.5 Data catalogue

Jon has replies from more contacts. Peter proposed making updates available after the release.

4. Administration

Alan noted some issues with Java applets on the EMBOSS machines using Open SDK or Sun's version. Fedora updates have made further changes but BioNemus still fails. On all machines the Fedora SDK has been replaced by a link to the Sun version.

Alan noted that Adobe no longer produces a beta Flash for 64 bits. If there are problems, it is possible to add a wrapper to support the 32-bit version.

Peter has ordered XML software for Jon.

Alan has installed XMLmind and OBO-Edit on the server.

5. Documentation and Training

5.1 Books

The "blurb" for the books has been checked and needs approval.

Jon has made minor corrections to the XML sources.

6. User queries and answers

All done.

7. AOB

Peter reported on the EMBRACE/EMBnet workshop on next-generation sequencing in Bari.

8. Date Of Next Meeting

The next meeting will be on Monday 28th June.