EMBOSS: Project Meeting (Wed 31st May 2006)


Attendees

EBI: Peter Rice, Jon Ison, Alan Bleasby, Mahmut Uludag, Shaun McGlinchey
Sanger:
Visitors:
Apologies: Tim Carver, Rodrigo Lopez, Lisa Mullan

1. Minutes of the last meeting

Minutes of the meeting of 16th May 2006 are here.

2. Software Development

2.1 Windows

Alan has successfully built the EMBOSS CVS code on windows using Visual Studio C++. Embossversion reports the correct version, and seqret is able to read from the EBI SRS server, and iep has been tested with asis::d as the input.

A few "#ifndef WIN32" controls remain in emma and tfm because of issues in launching other applications. Input pipes are commented out, for example ajFileNewInpipe. Some other cleanups are still to be committed, for example in ajfile.c.

Unix pathnames have to be corrected to backslashes. ajdefine.h has 4 new definitions, including SLASH_CHAR and CURRENT_DIR, and for the path separator (";" on Windows, ":" on Unix).

ajnam.c was very different in the 2.10.0 EMBOSSWIN, and has been updated to the current code. it needs 4 environment variables, EMBOSS_ROOT, EMBOSS_ACDROOT, EMBOSS_DATA and PLPLOT_LIB. However, the installer asks various questions and will be able to automatically set these. The initialisation file name has been left with the UNix name of emboss.default. EMBOSSWIN 2.10.0 used its own name.

A few compiler warnings remain. Most are missing casts, for example a loss of precision wit float and double. A few will be in the ajfmt functions, many are from the PLFLT definition in plplot.

Many of the "#ifdef WIN32" lines can be removed in the next commit.

Alan has created a build utility that will generate the win32 version of the code tree from a standard CVS checkout. Thsi creates a zip file which can be copied to windows and unpacked, then build using Visual C++. Peter has installed Visual C++ and will test the procedure.

2.2 Pipeline Pilot

Alan has installed SciTegic's PipelinePilot on emboss1 for testing. Apparently SciTegic on support RedHat Enterprise Linux 3, and installation of a SQL component failed, but it seems to be working, at least for our testing purposes.

2.3 Other development

Peter has implemented the new EMBL ID line format, as a new format "emblnew". Both the old and new EMBL formats are supported for input when reading sequences and in dbiflat and dbxflat indexing databases. Peter is working on support for GFF3 feature format. This requires all feature types to be in the SOFA ontology. This is implemnented by defining the SO identifier as the internal name, and converting feature types betwen the external and internal names using these identifiers. EMBL and GenBank formats arer now working as before. Some programs have been using the internal feature data structures, and need to call function to return the correct feature type. Alan has identified and fixed a problem reported by NCBI in ajindex.c (the dbx indexing programs) where a page size of 4096 and cache of 100 led to problems with page locking.

Dom library code is on hold until Alan has free time again. Jon has written the C code for the meme and mast wrappers. The programs still need testing and documentation. In the ACD files, enforcing all the program constraints is unreasonable so some checking is carried out in the C code. Peter will review the ACD files to check whether applications should be split. Programs that need to check for protein or DNA input should be using the $(ACDPROTEIN) variable, in place of whatever mechanisms meme and mast may use. Mahmut has been prototyping SoapLab, so far no new issues have arisen. It may be good to extend the perl parsing script which has not been updated since EMBOSS 2.9.0. Shaun had a meeting with Alberto in External Services to discuss ACD to WSDL generation. Alberto would like to investigate the Jemboss approach and will contact Tim Carver for more information. Shaun is using eclipse for development, which is now installed on the emboss9-16 development machines.

3. Administration

3.1 Emboss-Submit Backlog

Henrikki Almusa's contributions are a priority before the release, especially the pattern list library changes. Jon will look through the submissions to date.

3.2 Release 4.0.0

The main issue still to be addressed is implementing Outdata ACD type for output. Other items on the shortlist are done or in progress.

Jon had reviewed the notes from meetings with Marc Colet (EMBnet Belgium) and added requests to the sourceforge feature tracker.

4. Documentation & Training

4.1 Developer training course

The cresponses from the EBI Industry Programme attendees were very good for the developers course. Although one requested a more structured practical session, another rated the practicals 5/5 so we seem to have got the emphasis just right.

Some more documentation is needed on how to document new applications developed under myemboss.

The course identified a need to isolate the development utilies acdtrace etc. in a separate EMBASSY package "acdutils" which would also make the main EMBOSS package documentation cleaner as these are non-typical applications. Probably "acdc" will remain as an EMBOSS program.

For developers it would also help if we can document how to use eclipse as a development environment.

In preparation for the release, Alan will set the version number to 4.0.0 and Peter will set the date for all tests and usage examples to be 15th July 2006.

4.2 Website updates

The EFUNC and EDATA documentation needs to be autogenerated with sections annotated. This is relatively simple to do, but would only benefit the major source files so far cleaned up.

The index.html files also need to be autogenerated to avoid manual maintenance.

5. User queries and answers

The list was reviewed. MOst had already been resolved.

6. AOB

Alan noted that NCBI's nrdb database has id.version in the ID field of their piped FASTA IDs. We could consider applying some special regular expression to index such entries in dbxfasta and dbifasta, but only in a future release. There are some issues about how such IDs would be parsed when read by other programs.

Regular expressions have bene cleaned up in directory name processing as part of the Windows build. We still have a number of regular expressions that should be replaced for performance reasons, but such a major profiling effort will be after the release.

Alan noted that regular expressions are also an issue on IntelMac and Solaris systems where the PCRE library code requires reduced optimisation (level O1) to get correct results.

7. Date Of Next Meeting

Next meeting is at last on a Monday, 12th June.