EMBOSS: Project Meeting (Mon 14th June 10) |
BAM format is a binary input format and raises several issues. A test is needed for standard input, using a new function ajFileIsFile which will need modification to work under Windows. Bam format tests the last 28 bytes of the file so there is also an allowance for a file offset of -28 from the end to fail for an empty or short file.
Alan has corrected the BAM code to compile on Windows. There were issues with inline functions in headers and the naming of off_t which should be replaced by sys_t.
Peter has updated the Sequence Ontology terms in feature table internals to match the latest 2.4.3 release of SO and SOFA. The variation term previously used is obsolete, and is replaced by one which has a suitabel name although it is currently labelled as polypeptide-specific.
Mahmut is testing SAM and BAM formats using picard (a Java version of samtools whish supports extensive unit tests for the detailed support of these formats. Initial tests use fastq data converted to SAM and BAM files.
Mahmut has also tested alignments in SAM files, for example as supermatcher output. New CIGAR string handling functions were required.
SRS retrieval should check the number of entries and work with a limited chunk size, which must be small to handle very large assembly entries. This will need at least one extra call for each USA. Peter will nmake the code changes.
Alan has looked into column ordering in Mart server results processed by ajMartCheckHeader. Function ajMartSetHeader can be used before the query to specify that a column header is required, returning the long name in the header. ajMartCheckHeader does an attributes query and matches the column names, returning a NULL_terminated string array. This is only needed once per query. It can be activated to check that the first value in the array matches the expected sequence attribute.
BioMart uses RESTful URL access, recommended by Syed Haider. Michael Schulster's Ensembl API code uses SQL access. BioMart can also be used to access Ensembl data.
Data providers have been contacted by email to comment on the query records.
Jon has committed ACD files with corrected relations attributes.
Jon has discussed SAWSDL with the BioCatalogue team at EBI. Discussion will continue on the BioCatalogue mailing lists. There is a need for common tools to browse and edit EDAM and other ontologies. A wiki page has been set up to collect recommendations. Some content on the EDAM pages has been moved to the Wiki.
EDAM user and developer mailing lists have been created.
We are waiting for a quote from the systems group.
Mahmut is looking into a report of vectorstrip not trimmming with -allsequences set.
These was a report of pepwindow crashing under the Mobyle interface.
Mahmut noted that MIRA uses SSAHA to trim adaptor sequences. It is now possible to use supermatcher as an alternative, generating the same output format.
Peter noted that July 15th will be the 10th anniversary of the EMBOSS 1.0.0 release.