EMBOSS: Project Meeting (Mon 27th November 2006)
setenv CONFIG_SHELL /bin/tcsh or export CONFIG_SHELL=/bin/bash
Alan has successfully tested EMBOSS 4.0.0 with all patches on an Intel Mac.
Peter has worked through the parsing of NCBI style IDs in FASTA files. There were various issues including whether the database name from the input file, the -sdbname option, or the -osdbname option should be used in output. Also in some cases a database name could be preserved from a previous entry - where NCBI format files had a mixture of sequences with and without databases. New attributes were required for sequence input and output objects to separate the command line database name settings from those read in the input.
Peter has added a new application extractalign to extract regions from a sequence alignment. It is a minor modification of extractseq. Another application to appear soon is wordfinder a modification of supermatcher to search for word-based near identical hits in a (protein) database to help the pathogen sequencing group at Sanger.
Peter investigated a request for a program that can handle pattern matches with very large result sets. The user is happy with the available programs. They ran slowly because he really does want an enormous number of hits.
Peter has worked through past requests for applications and features. He will collect them together so we can set priorities as a "Dear Santa" list, with help from the user community.
Mahmut reported that LSF has been upgraded and SoapLab job hangs have been fixed. Job submission is temporarily synchronous.
The database definition for EMBL has been changed to support historic ID numbers. To fix the 2.8.0 server some applications were copied from 4.0.0. This appears to work.
Error messages for database access could be improved, especially for SoapLab users who are guessing the USA syntax. It would be useful if documentation URLs could be given in the SoapLab error messages.
Peter had discussions with the Taverna/OMII people in Manchester about improving metadata support. SoapLab should report pairs of mutually exclusive options (direct_data and usa sequence input for example) as a version of the mandatory value. We also should identify more carefully the minimum set of inputs needed for an application to run so that these ports can be coloured and controlled more easily in Taverna. The discussion also included ways to notify Taverna that an application (or an entire soaplab server) is obsolete and to point to an alternative server.
Peter is having meetings with local EMBOSS users to discuss database configurations and application needs. The first meeting was be with Babraham Research Institute last week. The ability to combine databases would be useful as they have split EMBL into a number of subdatabases. They are looking for an interface to suit their users needs which include project management functions. Customizing wEMBOSS was recommended as something worth investigating.
Alan is still working on obtaining purify.
Remaining work includes filling in a few gaps in the text, making the style consistent, and checking the automatically generated content. The nucleus library function names still need to be cleaned up, although in most cases they are reasonably consistent already.
All new issues were considered to be resolved.
The next meeting is on Monday 11th December. This will be the last meeting of 2006.