|
EMBOSS: Project Meeting (Mon 10th Oct 11)
|
Attendees
EBI:
Peter Rice,
Alan Bleasby,
Jon Ison,
Mahmut Uludag,
Michael Schuster
Visitors:
Apologies:
1. Minutes of the last meeting
The minutes of the meeting on 3rd October 2011 are
here.
2. Maintenance etc.
2.1 Applications
Peter has fixed a long-standing issue in fuzznuc and
other applications where a variable length region in a pattern (for
example N{0,22}) gives only the longest match with the PCRE library
used by ajRegExec. This uses greedy matching, as used by Perl
(it is a perl-compatible library). PCRE has an alternative executor
for regular expressions using DFA (pcre_dfa_exec) which reports all
matches as the list of substrings. It is unable to handle brackets
(substrings) in a regular expression as it uses the results space for
the sub matches instead. The fuzz* and *reg applications do not use
substrings.
Applications updated are fuzznuc, fuzzpro, fuzztran,
dreg and preg.
2.2 Libraries
Michael has corrected the syntax for function pointers. Where a
function reference is stored in a variable and then dereferenced it
should be explicitly stored as a pointer and dereferenced as
*pointer. Although gcc and other compilers automatically treat
function references as pointers and dereference the pointer when it is
used to call a function this does not work on the Intel
compiler. Changes were needed internally in ajlist.c
and ajtable.c though their interfaces are unaffected.
Enumerations can give useful warnings in switch blocks where some
values are not defined in case statements and there is no default case
to catch them.
2.3 Configuration
Alan has rebuilt mEMBOSS after the recent code commits.
2.4 Other
Alan asked whether Jemboss still uses URLs for missing
documentation. Mahmut replied that the Jemboss server uses the
locally installed documentation so URLs should not be needed.
3. New developments
3.1 Assemblies
Mahmut has BAM input working using 'bai' index files. The code
uses samtools hash tables which will be replaced by AjPTable.
Range queries are being implemented using the EMBOSS query language
with qualifiers -cbegin and -cend for contig begin and end.
A significant difference in assembly data is the use of "assembly
types" in SAM and BAM files, specific to MIRA, VELVET, samtools, etc.
In a query, sequence files use the sequence ID from the USA. For
assemblies the equivalent is the contig name.
Alan noted one line in the BAM code used a "static const int"
value as an initial array size. This was fixed.
3.2 EDAM and DRCAT
Jon is planning work for the EDAM beta 13 release, and a set of
additions to DRCAT.
3.3 Text compression
Alan is working on text compression code.
3.4 Projects
Peter raised a suggestion to define 'projects' by adding a
-project qualifier to all applications and using this to define
defaults such as output directory. In future we could consider a MySQL
database to hold user project information.
The original requirement was to define a directory as input for an
application to read a set of feature files and apply them to a single
sequence.
Suggestions are welcome for useful information that could be stored
or managed by project.
4. Administration
A new patch release was issued fixing diffseq and data access
using the NCBI Entrez server.
Alan reported one user experiencing problems in downloading
mEMBOSS from the FTP server. The downloaded files were of various
sizes. A "save as" from one browser gave a successful download. The
cause is unknown but assumed to be at the user's end.
5. Documentation and Training
5.1 Web server
Jon has updated the website as discussed after last week's meeting.
A few additional index pages may be needed to complete the site, plus
links to reference documentation and an HTML copy of the book texts.
The website also needs documentation for the EMBASSY packages where
information is currently sparse.
Alan has added a new logo to the wiki pages. It s derived from the
book covers so we need to ask CUP for permission to use it on the new
website.
5.2 Training
Jon suggested three training courses. For users we can adapt
the EMBOSS tutorial. For developers we can use the "your first
application" material. For administrators we can describe some of the
new features in a series of short tutorials.
Peter will arrange a training account to set up the courses.
5.3 Books
Jon would like suggestions and contacts for further publicizing the
books. We can meet with CUP to discuss possibilities.
6. User queries and answers
All done.
7. AOB
None.
8. Date Of Next Meeting
The next EMBOSS meeting will be on Monday 17th October.