|
EMBOSS: Project Meeting (Mon 22nd Aug 11)
|
Attendees
EBI:
Peter Rice,
Alan Bleasby,
Jon Ison,
Mahmut Uludag
Michael Schuster
Visitors:
Apologies:
1. Minutes of the last meeting
Minutes of the meeting of 15th August 2011 are
here.
2. Maintenance etc.
2.1 Applications
2.2 Libraries
Peter has implemented parsing of EMBL/GenBank CON entries for
reference sequences and for sequence data. CON entries have features,
but refer to whole genome shotgun entries for the sequence
data. Several errors were found in the current EMBL release where the
WGS entries have moved but the CON entry has not been updated.
Peter has been in in discussions on the GFF3 feature format on the
GMOD and Sequence Ontology mailing lists.
EMBOSS GFF3 handling has been improved to use correct a few feature
names, support circular features, and to follow GFF3 style for tags
and parent links. SO terms still fail to cover all protein
features. GFF3 format has no way to formally identify features for a
protein sequence, so EMBOSS will continue to use a comment in the
header for this.
Peter has extended database definitions to use a new "special:"
attribute where library code can look for known prefixes in the tag
value. These will replace the current specially formatted comment
attributes in ensembl code.
Peter will add LGPL and CVS tags in the header files.
Mahmut asked whether the use of header files could be more
specific in the libraries as any header change currently causes a
complete rebuild in eclipse.
Michael would like to document enumerated types and
constants. Peter will extend the documentation scripts to cover
these declarations.
2.3 Other
Michael is working on updates to configuration and will be
ready to commit soon. Unused symbols are being removed and definitions
relocated as appropriate in ajdefine.h and ajarch.h.
Mahmut noted that Jemboss now requires the 1.6 awt desktop, so
compilation fails with Java 1.5. Tests could be added to the Makefile
which currently states 1.4 as the requirement.
Michael noted that some CYGWIN tests in configure appear to be
not used.
Michael offered to demonstrate a Solaris virtual machine that
can be used in VirtualBox for testing.
3. New developments
3.1 Assembly data
Mahmut has a parser for MAF format assembly data from MIRA, and
for data in SAM format. Peter Cock's maf2sam script was
particularly helpful in resolving differences between the formats. The
SAM format extends the sequence parsing to include mapping and header
information.
3.2 Ensembl
Michael has data structures for assemblies and mapping
coordinates at the chromosome, clone and contig levels. These can be
used to map and remap features at each level.
Michael will compare the CIGAR alignment string handling in
Ensembl with Mahmut latest code to check for consistency
especially in the use of non-standard characters.
Michael would like to add Wiki pages to describe the Ensembl
API code at a high level.
3.3 DRCAT
Jon has a set of 1700 resource definitions from the Nucleic
Acids Research website which could be added to DRCAT. It would be
helpful if these could be automated, but there are many special cases.
3.4 EDAM
EDAM topics will need extending to have a wider scope, and some topic
terms are tool-centric rather than data-centric. A refactoring could
be helpful.
Jon reported a request for annotation of web services with
EDAM. He was referred to the EDAM website and BioCatalogue, and will
be invited to the next EDAM meeting.
4. Administration
None.
5. Documentation and Training
5.1 Web server
Peter and Jon will work on possibilities for a revised
home page.
5.2 Books
We should meet soon with CUP to discuss the books.
Jon suggested contacting training providers and asking for feedback.
6. User queries and answers
All done.
7. AOB
None.
8. Date Of Next Meeting
Monday 29th August is a public holiday.
The next EMBOSS meeting will be on Monday 5th September.