[an error occurred while processing this directive]
Notes on application for codon usage / composition bias

Possibly new application to build alignment from BLAST HSP


I have a BLAST alignment: query sequence and database
sequence.

The alignment is only showing the HSP from the blast
output as expected,
however I want to build an alignment of the entire
database sequence
against my query sequence.

I tried using needle from EMBOSS, however its aligning
the sequences
completely different than BLAST does. What I'd really
like is a way to
anchor the alignment based on the BLAST HSP. Does
anyone know how to do
this, or what tool(s) will allow me to do this?

Ryan



You are quite right that EMBOSS may align the sequences
completely
differently - unless the HSPs are very significant and
cover most of the
sequence this will be true of any attempt to simply
realign. There has to
be some way to pass on the HSPs as fixed positions, as
in the BioPerl
solution.

However, it could make a nice EMBOSS application - the
only question would
be how you would like to specify the HSPs. Perhaps we
could read BLAST
output (in some specified format), or perhaps some
other way to give the
input alignments.

We do have at least one EMBOSS application that does
something similar
(finds all long perfect matches and interpolates) - we
just need to reuse
the interpolation code which is basically doing a
global alignment of the
bits in between. That also tackles the problem of
choosing which
non-compatible initial matches to use.

Hope that helps,

Peter




Hi Peter,

> You are quite right that EMBOSS may align the
sequences completely
> differently - unless the HSPs are very significant
and cover most
> of the sequence this will be true of any attempt to
simply realign.
> There has to be some way to pass on the HSPs as fixed
positions,
> as in the BioPerl solution.

I looked at a bioperl method, but can't seem to find
something that will
accomplish this.

> However, it could make a nice EMBOSS application -
the only question
> would be how you would like to specify the HSPs.
Perhaps we could read

> BLAST output (in some specified format), or perhaps
some other way to
> give the input alignments.

Yes, I agree. I suppose the best way would be to
specify the two
sequences and the blast output. The application could
then construct an
alignment based on a particular HSP (probably the first
one, or whatever
the user specifies).

Ryan


Have you tried this:
http://bioweb.pasteur.fr/seqanal/interfaces/seqsblast.html

It is based on bioperl. check "Get HSP" option (you can
even extend it).

Best,

--
Catherine Letondal -- Institut Pasteur -- Computing Center




Hi all,

I didnt read it before, sorry for the "lapsus". And
sorry for the
information if what I tell you is not exactly what you
needed, Ryan.

What you are looking for is just _MVIEW_, an old but
nice application.
Use scholar.google.com / pubmed to find more
information about it, I
remember that there are web servers running cgi's
somewhere. It is
possible than during this last years, somebody has
published a new
better tool or a new mview version.... Look for it.

MVIEW is a parser for your blast output.
MVIEW works for your problem because you wanna align
only one sequence
(as a template) to a entire database (I suppose that
with any cutoff in
the e-value or p-vale, at least the default, it is,
ten) or against a
set of some sequences or only one more sequence (2
sequences alignment).

I continue with some considerations about aligning HSPs
from Blast the
way you pretend and mview does... there are important
considerations and
it is only a minute to read:
Remember, what you get is what you wanted, but not a
real thing (this is
something very typical in bioinformatics - and all
science - hahaha).
You dont get a real multiple alignment, you get an
artifact that is a
entire database's gene-blast.hsps constructs piled down
a template gene
(your sequence).
All right then. You dont have by any means an
alignment, nor even an
alignment of the genes using HSPs, because, there can
be some hsps
alignable between sequences in the database that are
hidden for the
alignment when sequences are piled down your sequence,
because your
sequence lacks this hsps and are _ignored_.
Why is this so important?
What I actually mean is that if you use this "sequences
piled down a
template" as a multiple alignment, you will be lying
about the topology
underlying (it is, not lying ;-) in the gene network,
that arises from
your database plus your sequence when correctly
aligned, it is, all
against all... etc,etc, etc.
Well, it is the mathematical exhaustive-optimal way...
normally we use
heuristics again, and again, and again... But "all
against all" is the
key concept involved in the multiple alignment problem.
It is very
important to be aware of this things.
needle is the optimal way <-> Blast is the heuristic
Clustal is also a very very heuristic solution to the
massive problem of
multiple alignment. And personally I prefer to use
muscle that uses a
better mathematical model and is (right now) the
quickest aligner for
the most of the cases.

I am sure that most of you know it.
I hope it is usefull for newbies and others, so forgive
me for the
boring tedious discourse...