EMBOSS: Project Meeting (Mon 10th Oct 11)


EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag, Michael Schuster

1. Minutes of the last meeting

The minutes of the meeting on 3rd October 2011 are here.

2. Maintenance etc.

2.1 Applications

Peter has fixed a long-standing issue in fuzznuc and other applications where a variable length region in a pattern (for example N{0,22}) gives only the longest match with the PCRE library used by ajRegExec. This uses greedy matching, as used by Perl (it is a perl-compatible library). PCRE has an alternative executor for regular expressions using DFA (pcre_dfa_exec) which reports all matches as the list of substrings. It is unable to handle brackets (substrings) in a regular expression as it uses the results space for the sub matches instead. The fuzz* and *reg applications do not use substrings.

Applications updated are fuzznuc, fuzzpro, fuzztran, dreg and preg.

2.2 Libraries

Michael has corrected the syntax for function pointers. Where a function reference is stored in a variable and then dereferenced it should be explicitly stored as a pointer and dereferenced as *pointer. Although gcc and other compilers automatically treat function references as pointers and dereference the pointer when it is used to call a function this does not work on the Intel compiler. Changes were needed internally in ajlist.c and ajtable.c though their interfaces are unaffected.

Enumerations can give useful warnings in switch blocks where some values are not defined in case statements and there is no default case to catch them.

2.3 Configuration

Alan has rebuilt mEMBOSS after the recent code commits.

2.4 Other

Alan asked whether Jemboss still uses URLs for missing documentation. Mahmut replied that the Jemboss server uses the locally installed documentation so URLs should not be needed.

3. New developments

3.1 Assemblies

Mahmut has BAM input working using 'bai' index files. The code uses samtools hash tables which will be replaced by AjPTable. Range queries are being implemented using the EMBOSS query language with qualifiers -cbegin and -cend for contig begin and end.

A significant difference in assembly data is the use of "assembly types" in SAM and BAM files, specific to MIRA, VELVET, samtools, etc.

In a query, sequence files use the sequence ID from the USA. For assemblies the equivalent is the contig name.

Alan noted one line in the BAM code used a "static const int" value as an initial array size. This was fixed.

3.2 EDAM and DRCAT

Jon is planning work for the EDAM beta 13 release, and a set of additions to DRCAT.

3.3 Text compression

Alan is working on text compression code.

3.4 Projects

Peter raised a suggestion to define 'projects' by adding a -project qualifier to all applications and using this to define defaults such as output directory. In future we could consider a MySQL database to hold user project information.

The original requirement was to define a directory as input for an application to read a set of feature files and apply them to a single sequence.

Suggestions are welcome for useful information that could be stored or managed by project.

4. Administration

A new patch release was issued fixing diffseq and data access using the NCBI Entrez server.

Alan reported one user experiencing problems in downloading mEMBOSS from the FTP server. The downloaded files were of various sizes. A "save as" from one browser gave a successful download. The cause is unknown but assumed to be at the user's end.

5. Documentation and Training

5.1 Web server

Jon has updated the website as discussed after last week's meeting.

A few additional index pages may be needed to complete the site, plus links to reference documentation and an HTML copy of the book texts.

The website also needs documentation for the EMBASSY packages where information is currently sparse.

Alan has added a new logo to the wiki pages. It s derived from the book covers so we need to ask CUP for permission to use it on the new website.

5.2 Training

Jon suggested three training courses. For users we can adapt the EMBOSS tutorial. For developers we can use the "your first application" material. For administrators we can describe some of the new features in a series of short tutorials.

Peter will arrange a training account to set up the courses.

5.3 Books

Jon would like suggestions and contacts for further publicizing the books. We can meet with CUP to discuss possibilities.

6. User queries and answers

All done.

7. AOB


8. Date Of Next Meeting

The next EMBOSS meeting will be on Monday 17th October.