|
EMBOSS: Project Meeting (Mon 18th October 10)
|
Attendees
EBI:
Peter Rice,
Alan Bleasby,
Jon Ison,
Michael Schuster
Visitors:
Apologies:
Mahmut Uludag,
1. Minutes of the last meeting
Minutes of the meeting of 11th October 2010 are
here.
2. Maintenance etc.
2.1 Applications
2.2 Libraries
Peter has renamed the "gff" feature definition files to "gff2"
as the default is now "gff3".
2.3 Other
Alan has updated the configuration files to support large
files. We need to announce the new system-specific configurations and
get feedback from developers.
3. New developments
3.1 Axis2C
Mahmut is using the AJAX ajdom code for document
parsing. It may be possible to enrich some methods, for example to
process lists of features in DASGFF.
An alternative is the axiom (axis XML) library for SOAP results
which provides an easy way to retrieve values deep within the XML
structure. Mahmut reported that axis2 has a very granular
architecture with many libraries required even for a simple
application.
The GSOAP library requires 11 functions in stubs to create a simple
interface. The Axis2 interface is more natural, avoiding stub code.
Alan has received no reply from the GSOAP author on licensing
issues. There are no problems with using Axis2 as there is no need to
import code. Axis2 is a more recent project, from the Apache
group, and is considered to be a more reliable choice.
Mahmut noted that Jemboss uses a Java version of axis1.
3.2 Data access methods
Alan is looking at alternative indexing methods. So far, B+
trees look to be the best for identifiers and accessions. Long keys
are problematic. There are free text compressed indexing methods used
in bibliographic searching. Indexes are compressed and fast, using
Huffman-encoding with a size approximately 50% of the original source
text. The MG system (Managing gigabytes) has been used for image
searches
3.3 EDAM
Jon noted a request from Matus to add BioXSD as an input/output
format for sequences and features.
3.4 Data types
Peter has split the ajfeat code into data, read and
write source files as for other data types.
For text data, the input is stored as strings for each input line and
copied on output. Parsers can be added to handle HTML or XML markup.
4. Administration
The file server is online again. The temporary user home directories
have been removed.
5.0 Documentation and training
5.1 Books
Alan has essentially completed the replies to the copy editor
for the Administrators' Guide.
Jon is looking into indexing. We can create an index of terms,
but also need to identify the final page numbers. We need more
discussion with the publishers on the best way to send index data.
6. User queries and answers
All done.
7. AOB
None.
8. Date Of Next Meeting
The next EMBOSS meeting will be on Monday 25th October. Peter will
be away.