EMBOSS: Project Meeting (Mon 19th April 10)


Attendees

EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag
Visitors:
Apologies:

1. Minutes of the last meeting

Minutes of the meeting of 12th April 2010 are here.

2. Maintenance etc.

2.1 Applications

Alan has fixed tfm for two users who reported problems. One reported success. No news yet from the second user.

Alan has modified eprimer3 to support the new release 2.2.2b of primer3. The intermediate "Boulder I/O" format has changed considerably with new names for many tags. The modified version should be given a new name. So far no suggestion has been considered suitable.

Peter is working on updating the output format of showalign in response to a user request to number according to positions in a specified reference sequence rather than the whole alignment. This will require rewriting of the code for numbering the output. The reference sequence name is currently case sensitive, which is annoying. This will be changed to allow case-insensitive matching of sequence names.

Peter checked the reported problem with the mira EMBASSY package. This fails to install because of a problem with missing html documentation files. All other EMBASSY packages had no problem.

Mahmut reported two bugs in wordmatch. The alignment outputs had extra headers, and a header was still printed where no matches were found. The program will be fixed to revert to the original behaviour in these cases.

Mahmut is running profiling tests on supermatcher. The finding of seed matches if only a small proportion of the total time.

Mahmut noted that users on the mira mailing list are using SSAHA to clip adaptor sequences. SSAHA has command line sequence inputs the opposite way round compared to wordmatch. He will check the other EMBOSS applications to identify a consistent standard for the ordering of sequence inputs where sequence sets and sequence streams (seqall) are used.

2.2 Libraries

Alan has built mEMBOSS using Microsoft Visual Studio 10. This requires upgrading to the new version for mEMBOSS development. The configuration files were DOS cr/lf stored in CVS as binary files. These are now Unix linefeed delimited files. They can be stored in CVS as text files, and work in Visual Studio except that one can no longer double click on the DLLs.sln file. It may be possible to fix this in bundlewin by restoring the DOS line termination.

On Windows, Alan has implemented the ajFileNewInPipe function to allow pipe syntax for an open file.

Also on Windows, Alan has added code to convert filenames starting with '~/' to the user's home directory (HOMEDIR) and to convert '~username/' by finding the home directory of another user from the registry. The latter requires multiple calls to inter-convert string types.

Alan has updated the processing of directory delimiters in ajfile.c. These need checking to clarify whether Windows-style backslashes are already converted to forward slashes at this point.

Peter has updated ajsys to provide C char* versions of the string functions.

Peter is looking into adding new plplot devices for output as PDF and SVG. The plplot documentation suggests that these depend on third party libraries.

Mahmut suggested extending the alignment output formats to include "psl" and "pslx" (used by the UCSD browser and GFF (used by SSAHA). SSAHA also supports output in SAM format which includes soft tag clipping. This is not yet fully implemented in EMBOSS. The aligned sequenced need to be converted to "CIGAR" strings. SSAHA also has a native alignment format called "ALN" and SUGAR and VULGAR strings.

Alan suggested adding new sequence formats. There is an ongoing discussion on the EMBOSS mailing list.

Alan noted that the new Visual Studio 2010 no longer uses a hard-coded 32 bit Java location. It now checks for a reference to JAVA_HOME for the jdk32 directory. The latest bundlewin utility to build mEMBOSS has a directory v100 for the Visual Studio 2010 redistribution files.

Alan noted that the CVS server on OpenBio has specific files for building mEMBOSS, including the run time libraries and a recent file to set up the configuration for bundlewin.

2.3 SoapLab

Mahmut is generating test SAWSDL annotations using the ACD file relations attributes. The existing Java code has been modified to generate SAWSDL. Some modification is needed to separately handle the application relation values.

Jon will update the relations value to include the EDAM term identifier and name space. This should make SAWSDL generation easier.

2.4 Other

Alan has created a 32x32 logo for Jemboss using the 'j' of the current image. The icon size should be increased to 64x64.

Peter has investigated automatically generating the Galaxy interface definitions for EMBOSS applications. These use python scripts and XML files. The code appears to be simple to automate.

Alan offered to set up a proxy server with password protection to text implementation in EMBOSS. This will need to be on his home systems.

3. New developments

3.1 BioMart access

Peter has a stub access method for BioMart. This will need new attributes to be defined for sequence databases to identify the filters (query fields) and attributes (fields to be included in output). Alan suggested martquery could list the filters and attributes.

3.2 EDAM

Jon reported the acceptance of the registry and EDAM paper by Nucleic Acids Research. Matus is preparing a BioXSD publication with references to EDAM.

Jon had further discussions with the MicroArray group on their OBO-compatible software ontology.

Jon plans to simplify EDAM by adding simple terms to make the hierarchy easier to browse in OBO-Edit. There are some further terms to add to provide more complete coverage, including BioMOBY and others useful for BioCatalogue.

Jon will look into possible recommended browsers for EDAM, and creating web pages to be used as the end points of persistent URLs (PURLs).

Jon will further revise the EDAM documentation. Some internal cleanup is needed, but is not urgent.

For the next release, Jon aims to have definitions of data objects and types that can be returned. and their formats.

Database list

Jon noted that we need to add further data types, at least text, HTML and URLs, to return data from the non-sequence data resources.

4. Administration

Alan has now had a reply from the EBI systems groups. We hope to son have a test system to help define our server configuration.

5. Documentation and Training

5.1 Books

Jon will update the books on his web pages.

Peter will check through the URL references for any that are not currently available.

6. User queries and answers

All outstanding queries put on the Sourceforge tracker.

7. AOB

Peter will go to the Galaxy Developers Meeting in Cold Spring Harbor next month.

Peter will go to the EMBnet workshop in Bari to talk on EMBOSS and next generation sequence data.

Mahmut will not be available for the proposed Marmara course.

8. Date Of Next Meeting

The next meeting will be on Monday 26th April.