EMBOSS: Project Meeting (Mon 19th April 10) |
Alan has modified eprimer3 to support the new release 2.2.2b of primer3. The intermediate "Boulder I/O" format has changed considerably with new names for many tags. The modified version should be given a new name. So far no suggestion has been considered suitable.
Peter is working on updating the output format of showalign in response to a user request to number according to positions in a specified reference sequence rather than the whole alignment. This will require rewriting of the code for numbering the output. The reference sequence name is currently case sensitive, which is annoying. This will be changed to allow case-insensitive matching of sequence names.
Peter checked the reported problem with the mira EMBASSY package. This fails to install because of a problem with missing html documentation files. All other EMBASSY packages had no problem.
Mahmut reported two bugs in wordmatch. The alignment outputs had extra headers, and a header was still printed where no matches were found. The program will be fixed to revert to the original behaviour in these cases.
Mahmut is running profiling tests on supermatcher. The finding of seed matches if only a small proportion of the total time.
Mahmut noted that users on the mira mailing list are using SSAHA to clip adaptor sequences. SSAHA has command line sequence inputs the opposite way round compared to wordmatch. He will check the other EMBOSS applications to identify a consistent standard for the ordering of sequence inputs where sequence sets and sequence streams (seqall) are used.
On Windows, Alan has implemented the ajFileNewInPipe function to allow pipe syntax for an open file.
Also on Windows, Alan has added code to convert filenames starting with '~/' to the user's home directory (HOMEDIR) and to convert '~username/' by finding the home directory of another user from the registry. The latter requires multiple calls to inter-convert string types.
Alan has updated the processing of directory delimiters in ajfile.c. These need checking to clarify whether Windows-style backslashes are already converted to forward slashes at this point.
Peter has updated ajsys to provide C char* versions of the string functions.
Peter is looking into adding new plplot devices for output as PDF and SVG. The plplot documentation suggests that these depend on third party libraries.
Mahmut suggested extending the alignment output formats to include "psl" and "pslx" (used by the UCSD browser and GFF (used by SSAHA). SSAHA also supports output in SAM format which includes soft tag clipping. This is not yet fully implemented in EMBOSS. The aligned sequenced need to be converted to "CIGAR" strings. SSAHA also has a native alignment format called "ALN" and SUGAR and VULGAR strings.
Alan suggested adding new sequence formats. There is an ongoing discussion on the EMBOSS mailing list.
Alan noted that the new Visual Studio 2010 no longer uses a hard-coded 32 bit Java location. It now checks for a reference to JAVA_HOME for the jdk32 directory. The latest bundlewin utility to build mEMBOSS has a directory v100 for the Visual Studio 2010 redistribution files.
Alan noted that the CVS server on OpenBio has specific files for building mEMBOSS, including the run time libraries and a recent file to set up the configuration for bundlewin.
Jon will update the relations value to include the EDAM term identifier and name space. This should make SAWSDL generation easier.
Peter has investigated automatically generating the Galaxy interface definitions for EMBOSS applications. These use python scripts and XML files. The code appears to be simple to automate.
Alan offered to set up a proxy server with password protection to text implementation in EMBOSS. This will need to be on his home systems.
Jon had further discussions with the MicroArray group on their OBO-compatible software ontology.
Jon plans to simplify EDAM by adding simple terms to make the hierarchy easier to browse in OBO-Edit. There are some further terms to add to provide more complete coverage, including BioMOBY and others useful for BioCatalogue.
Jon will look into possible recommended browsers for EDAM, and creating web pages to be used as the end points of persistent URLs (PURLs).
Jon will further revise the EDAM documentation. Some internal cleanup is needed, but is not urgent.
For the next release, Jon aims to have definitions of data objects and types that can be returned. and their formats.
Peter will check through the URL references for any that are not currently available.
Peter will go to the EMBnet workshop in Bari to talk on EMBOSS and next generation sequence data.
Mahmut will not be available for the proposed Marmara course.