EMBOSS: Project Meeting (Mon 16th May 11)


Attendees

EBI: Peter Rice, Jon Ison, Michael Schuster
Visitors:
Apologies: Alan Bleasby, Mahmut Uludag,

1. Minutes of the last meeting

Minutes of the meeting of 9th May 2011 are here.

2. Maintenance etc.

2.1 Applications

None.

2.2 Libraries

None.

2.3 mEMBOSS

Peter has updated QA tests for mEMBOSS. Almost all are now working. Test data is needed for indexing EDAM and DRCAT. The current tests index the full contents which are in different locations.

File paths are converted to backslashes on output with new functions ajFileGetPrintNameC and S. Directories and filenames containing forwards slashes will not work in most cases in mEMBOSS. This allows emboss.standard and the server cache files to work unchanged in mEMBOSS.

Pre and post processing commands work with few changes. Environment variables use the SET preprocessing command and ignore the EXPORT command. Commands cp, 'rm -rf' and rm are converted to their Windows equivalent when written to the qatest.bat script file. The copy command with a wildcard always writes to the screen via standard error but this does not interfere with the test results.

Emma standard output from clustalw is not visible when run as a qa test but works as expected from the command line. Alan is investigating.

3. New developments

3.1 Access methods

Michael hopes to complete updating to Ensembl 61 this week. Only 4 modules remain to be updated.

The variation adaptor has a new subclass to support the 1000 genomes project. There are new general iterator methods to avoid 1 million or more objects being created in Perl. Memory management in C is less expensive so implementation in C is not urgent.

The server cache file is written by an application showensembl which reduces the number of SQL queries from 560 to 18 to return a sequence object. The DBALIAS attributes are generated and are working well for ensembl. Aliases include all names with underscores, one of which is the NCBI taxon. All names and aliases are in lower case.

Ensembl identifiers include the exon id, transcript id and translation id. They may also include identifiers generated from the gene stable id.

Havana data works well in EMBOSS, with plotorf able to find the longest ORF in a Havana conserved region.

The improved efficiency of the C API could be useful for variant effect [predictors. These now account for up to 75% of the ensembl hits.

3.2 New applications

Michael asked about possible ensembl applications.

Extensions to servertell and dbtell could be useful. These could include a way to link related databases through some additional attributes in their cache file definitions, for example Ensembl databases reporting different sequence object types from the same species.

4. Administration

Jon will present a flash (short) talk on EDAM at the ISMB Bio-Ontologies SIG in Vienna. Referees comments were constructive and will be addressed. One was for a website to provide access to DRCAT. Another was a complaint that definitions include the term name, although this is often unavoidable.

The replacement backup drive has been installed for emboss5.

5. Documentation and Training

6. User queries and answers

All done.

7. AOB

None.

8. Date Of Next Meeting

The next EMBOSS meeting will be on Monday 23rd May.