EMBOSS: Project Meeting (Mon 4th October 10)


EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag, Michael Schuster

1. Minutes of the last meeting

Minutes of the meeting of 27th September 2010 are here.

2. Maintenance etc.

2.1 Applications

Alan has a version of eprimer3 working with the new version of primer3_core. This application needs a new name. In the absence of any version-specific naming in primer3, the new name will be eprimer32 and remain the same even if primer3 version 3.3 has the same interface. The handling of external applications will require a new name to be defined for primer3_core for use in the eprimer32.acd file.

2.2 Libraries

Michael has continued to clean up the EFUNC and EDATA messages for the ensembl library, and to work on Intel compiler warnings.

2.3 Other

Alan has replaced the compiler configuration code for the main package and all the EMBASSY packages. The configure.in code only tested gcc and the proprietary cc compiler. This code has been replaced by case statements allowing tests for other compilers (e.g. the Intel icc compiler).

Similar changes simplify the operating-system dependencies in the configure.in scripts.

Alan has also looked further into large file configuration options.

3. New developments

3.1 The gsoap library

Mahmut has been working on using the gsoap library which works with web services described via a URL. Two utility applications in gsoap convert a URL to a C header file, and use the resulting header to make stub C source code files for a SOAP client. First tests use the EBI services wsdbfetch and wsebeye.

Alan has investigated options for compiling and linking gsoap with the EMBOSS libraries. Certain functions in the stub file are required. Other functions must be in the ajaxdb library. The URL used to generate the stub code must be included in the gsoap.m4 file and can not be user-supplied.

Alan noted that Fedora provides a shared gsoap library. BY default gsoap only builds as a static library. It is possible to kludge single-pass linking with the GNU linker but only for one version of the library build. If the library appears twice then libtool is likely to object.

Possible options include extracting libgsoap code into ajaxdb, but this will probably run into licensing issues. There are about 15k lines of code in total in gsoap. Alternatively, a kludge library could provide the functions "soap_putheader", "registration_putheader" with a callback to a renamed stub function.

We can also consider the use of axis or csoap. The latter has a libxml2 dependency and is limited to the old SOAP 1.1 protocol with no new csoap release for the last 5 years.

3.2 DAS access methods

Mahmut has looked into gsoap and expat to handle the XML C bindings for reading DAS features. Alan suggested looking into DOM parsing with an XSD file as possibly easier to implement. There is an example application domdemo in "make check". One issue with gsoap is the possibility that we may be unable to find a satisfactory way to link and distribute so it is better to avoid using it for anything not directly SOAP-specific.

3.3 EBI changes to web services

Mahmut is working with EBI external services on the new EBI interface for web services through their test server. The test application is called dbfetchexplore. The new interface returns network and query information. "runtestmethods" runs a test query.

3.4 EDAM

Jon has started a cleanup of the EDAM topic branch to cover topics needed for the annotation of the current set of services in the BioCatalogue.

3.5 Text access in EMBOSS

Peter has converted all single file-based access methods in ajseqdb to handle general text inputs. A new "ajtextdb.c" source file handles text input using an AjPTextin input object. This has the attributes used for general text access by the existing sequence and OBO access methods. The AjPSeqin and AjPOboin objects include an AjPTextin as their "Input" element. Each access method defined for a database is first tested against text methods and then against methods specific to the data type. Where an access method is to be called, the code expects to find either a text access method using the AjPTextin input element, or a type-specific method using the AjPSeqin or AjPOboin object. Text access results in an open file buffer with the pointer set to the start of an entry. A parser (defined by the database "format") processes the data in the file.

Text-based access will enable the data resources in the dbxref file to be easily defined as EMBOSS data sources, at least as text entries through a URL query. Jon is updating the query lines given by the resource3 providers to standardise the semantics and naming using EDAM terms to define a set of interoperable field names.

The query-handling code was made more general to handle OBO terms as well as sequences. It has now been made completely general, with only an AjPQuery object defining the field name, query, and a link operator between queries.The link operator can be "Else" (id, else if that fails try accession) or "or" to continue adding more results. Further operators can be added in future, and processed explicitly for access methods such as SRS. The query language will need to be extended to allow these to be defined on the command line through a USA (or the equivalent for other input data types).

The code is working and will be further QA tested and passed through the valgrind test suite before it is committed.

4. Administration

4.1 Hardware

We are waiting for systems to reinstall the replacement emboss7 server. Peter will send them a reminder.

5. Documentation and Training

5.1 Books

Jon has the copy editor's version of the Developers Guide. The processing by the typesetters has introduced formatting errors which are clear from a comparison to the word document original. We hope the copy editor can handle the amendments.

6. User queries and answers

All done.

7. AOB


8. Date Of Next Meeting

The next EMBOSS meeting will be on Monday 11th October.