EMBOSS: Project Meeting (Mon 21st June 10) |
Peter has added the organism (using the taxon cross-reference data) to the output columns in infoseq
Mahmut has fixed and updated vectorstrip.
Mahmut has updated and committed needle with patches on the FTP server.
Peter has added BAM format output. BAM is a binary version of the SAM next-generation sequence format.
SAM input can now be processed without the need for a header.
Peter has modified SRSWWW access to make an initial call to count the number of entries and then to loop over them in chunks retrieving data. This makes SRSWWW access safer, but also twice as slow.
Mahmut noted that in FASTQ input files the quality score size was not set precisely when the object was reused.
Mahmut is using the Picard tool to validate FASTQ files in EMBOSS and SamTools.
SAM format can also be used by alignment tools (e.g. SSAHA). for pairwise alignment.
Mahmut has downloaded Mosaik which has C. elegans assembly files to use in testing data formats.
Alan has added an ajStrToUlong function to avoid casts. The existing ajStrToUint function uses the older strtol function, and should be updated. The C89 strtoul function was missing from AIX 1.0 but should now be part of the ANSI C standard and available on all systems.
Alan noted that the latest libpng only works with the latest libgd release candidate.
Mahmut has removed Makefiles from the Java package directories which cause recursive errors when java is not installed in the CVS developer's code.
Alan has updated the version number to 6.3.0 for the release.
Mahmut has completed wrapping of the remaining types in SoapLab, with help from EBI External Services.
Mahmut noted that "config.h" files can clash, for example between embassy packages and EMBOSS. Some tasks are carried out before the include, and there are possible clashes with PCRE and PLPLOT. Alan would also like to standardise version handling for Windows installations.
Alan has tested the latest CYGWIN 1.7.5 which works but has an experimental version of libtool 2 installed.
Mahmut noted that the BioMart access returns tab-delimited data which may clash with the SAM data format. Peter will check that they are tested in the best order to avoid conflicts. One solution is to not test "biomart" format automatically.
It is also possible to obtain a list of databases from the server.
Ensembl code is returning sequence objects, not suitable for display as text by entret. Peter will design an interface to allow entret to work with objects.
Jon described BioNemus which is an application which would benefit from adding conservative type information to EDAM, for example describing a sequence as having a primitive string type and aggregations such as a sequence record "has_a" sequence. Cardinality could also be added where applicable.
Alan noted that Adobe no longer produces a beta Flash for 64 bits. If there are problems, it is possible to add a wrapper to support the 32-bit version.
Peter has ordered XML software for Jon.
Alan has installed XMLmind and OBO-Edit on the server.
Jon has made minor corrections to the XML sources.