ehmmalign

 

Wiki

The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.

Please help by correcting and extending the Wiki pages.

Function

Align sequences to an HMM profile

Description

EMBASSY HMMER is a suite of application wrappers to the original hmmer v2.3.2 applications written by Sean Eddy. hmmer v2.3.2 must be installed on the same system as EMBOSS and the location of the hmmer executables must be defined in your path for EMBASSY HMMER to work.

Usage:
ehmmalign [options] hmmfile seqfile outfile

The outfile parameter is new to EMBASSY HMMER. The multiple sequence alignment is always written to outfile. The name of outfile is specified by the -o option as normal.

hmmalign reads an HMM profile and a set of sequences , aligns the sequences to the profile HMM, and outputs a multiple sequence alignment . The set of sequences may be unaligned or aligned. If it is aligned, the existing alignment is ignored and hmmalign will align them the way it wants.

Algorithm

Please read the Userguide.pdf distributed with the original HMMER and included in the EMBASSY HMMER distribution under the DOCS directory.

Usage

Here is a sample session with ehmmalign


% ehmmalign ../ehmmcalibrate-ex-keep/globino.hmm globins630.fa globins630.ali 
Align sequences to an HMM profile

hmmalign - align sequences to an HMM profile
HMMER 2.3.2 (Oct 2003)
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
Freely distributed under the GNU General Public License (GPL)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:             ../ehmmcalibrate-ex-keep/globino.hmm
Sequence file:        ./ehmmalign-1234567890.1234
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Alignment saved in file globins630.ali

/shared/software/bin/hmmalign  --informat FASTA --outformat A2M  -o globins630.ali ../ehmmcalibrate-ex-keep/globino.hmm ./ehmmalign-1234567890.1234


Go to the input files for this example
Go to the output files for this example

Command line arguments

Where possible, the same command-line qualifier names and parameter order is used as in the original hmmer. There are however several unavoidable differences and these are clearly documented in the "Notes" section below.

More or less all options documented as "expert" in the original hmmer user guide are given in ACD as "advanced" options (-options must be specified on the command-line in order to be prompted for a value for them).

Align sequences to an HMM profile
Version: EMBOSS:6.2.0

   Standard (Mandatory) qualifiers:
  [-hmmfile]           infile     File containing a HMM profile
  [-seqfile]           seqset     File containing a (set of) sequence(s)
  [-o]                 align      [*.ehmmalign] Multiple sequence alignment
                                  output file.

   Additional (Optional) qualifiers:
   -m                  boolean    [N] Include in the alignment only those
                                  symbols aligned to match states. Do not show
                                  symbols assigned to insert states.
   -q                  boolean    [N] Quiet; suppress all output except the
                                  alignment itself. Useful for piping or
                                  redirecting the output.

   Advanced (Unprompted) qualifiers:
   -mapali             infile     Reads an alignment from file and aligns it
                                  as a single object to the HMM; e.g. the
                                  alignment in file is held fixed. This allows
                                  you to align sequences to a model with
                                  hmmalign and view them in the context of an
                                  existing trusted multiple alignment. The
                                  alignment to the alignment is defined by a
                                  'map' kept in the HMM, and so is fast and
                                  guaranteed to be consistent with the way the
                                  HMM was constructed from the alignment. The
                                  alignment in the file must be exactly the
                                  alignment that the HMM was built from.
                                  Compare the -withali option.
   -withali            infile     Reads an alignment from file and aligns it
                                  as a single object to the HMM; e.g. the
                                  alignment in file is held fixed. This allows
                                  you to align sequences to a model with
                                  hmmalign and view them in the context of an
                                  existing trusted multiple alignment. The
                                  alignment to the alignment is done with a
                                  heuristic (nonoptimal) dynamic programming
                                  procedure, which may be somewhat slow and is
                                  not guaranteed to be completely consistent
                                  with the way the HMM was constructed (though
                                  it should be quite close). However, any
                                  alignment can be used, not just the
                                  alignment that the HMM was built from.
                                  Compare the -mapali option.

   Associated qualifiers:

   "-seqfile" associated qualifiers
   -sbegin2            integer    Start of each sequence to be used
   -send2              integer    End of each sequence to be used
   -sreverse2          boolean    Reverse (if DNA)
   -sask2              boolean    Ask for begin/end/reverse
   -snucleotide2       boolean    Sequence is nucleotide
   -sprotein2          boolean    Sequence is protein
   -slower2            boolean    Make lower case
   -supper2            boolean    Make upper case
   -sformat2           string     Input sequence format
   -sdbname2           string     Database name
   -sid2               string     Entryname
   -ufo2               string     UFO features
   -fformat2           string     Features format
   -fopenfile2         string     Features file name

   "-o" associated qualifiers
   -aformat3           string     Alignment format
   -aextension3        string     File name extension
   -adirectory3        string     Output directory
   -aname3             string     Base file name
   -awidth3            integer    Alignment width
   -aaccshow3          boolean    Show accession number in the header
   -adesshow3          boolean    Show description in the header
   -ausashow3          boolean    Show the full USA in the alignment
   -aglobal3           boolean    Show the full sequence in alignment

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit

Qualifier Type Description Allowed values Default
Standard (Mandatory) qualifiers
[-hmmfile]
(Parameter 1)
infile File containing a HMM profile Input file Required
[-seqfile]
(Parameter 2)
seqset File containing a (set of) sequence(s) Readable set of sequences Required
[-o]
(Parameter 3)
align Multiple sequence alignment output file. Alignment output file <*>.ehmmalign
Additional (Optional) qualifiers
-m boolean Include in the alignment only those symbols aligned to match states. Do not show symbols assigned to insert states. Boolean value Yes/No No
-q boolean Quiet; suppress all output except the alignment itself. Useful for piping or redirecting the output. Boolean value Yes/No No
Advanced (Unprompted) qualifiers
-mapali infile Reads an alignment from file and aligns it as a single object to the HMM; e.g. the alignment in file is held fixed. This allows you to align sequences to a model with hmmalign and view them in the context of an existing trusted multiple alignment. The alignment to the alignment is defined by a 'map' kept in the HMM, and so is fast and guaranteed to be consistent with the way the HMM was constructed from the alignment. The alignment in the file must be exactly the alignment that the HMM was built from. Compare the -withali option. Input file Required
-withali infile Reads an alignment from file and aligns it as a single object to the HMM; e.g. the alignment in file is held fixed. This allows you to align sequences to a model with hmmalign and view them in the context of an existing trusted multiple alignment. The alignment to the alignment is done with a heuristic (nonoptimal) dynamic programming procedure, which may be somewhat slow and is not guaranteed to be completely consistent with the way the HMM was constructed (though it should be quite close). However, any alignment can be used, not just the alignment that the HMM was built from. Compare the -mapali option. Input file Required
Associated qualifiers
"-seqfile" associated seqset qualifiers
-sbegin2
-sbegin_seqfile
integer Start of each sequence to be used Any integer value 0
-send2
-send_seqfile
integer End of each sequence to be used Any integer value 0
-sreverse2
-sreverse_seqfile
boolean Reverse (if DNA) Boolean value Yes/No N
-sask2
-sask_seqfile
boolean Ask for begin/end/reverse Boolean value Yes/No N
-snucleotide2
-snucleotide_seqfile
boolean Sequence is nucleotide Boolean value Yes/No N
-sprotein2
-sprotein_seqfile
boolean Sequence is protein Boolean value Yes/No N
-slower2
-slower_seqfile
boolean Make lower case Boolean value Yes/No N
-supper2
-supper_seqfile
boolean Make upper case Boolean value Yes/No N
-sformat2
-sformat_seqfile
string Input sequence format Any string  
-sdbname2
-sdbname_seqfile
string Database name Any string  
-sid2
-sid_seqfile
string Entryname Any string  
-ufo2
-ufo_seqfile
string UFO features Any string  
-fformat2
-fformat_seqfile
string Features format Any string  
-fopenfile2
-fopenfile_seqfile
string Features file name Any string  
"-o" associated align qualifiers
-aformat3
-aformat_o
string Alignment format Any string fasta
-aextension3
-aextension_o
string File name extension Any string  
-adirectory3
-adirectory_o
string Output directory Any string  
-aname3
-aname_o
string Base file name Any string  
-awidth3
-awidth_o
integer Alignment width Any integer value 0
-aaccshow3
-aaccshow_o
boolean Show accession number in the header Boolean value Yes/No N
-adesshow3
-adesshow_o
boolean Show description in the header Boolean value Yes/No N
-ausashow3
-ausashow_o
boolean Show the full USA in the alignment Boolean value Yes/No N
-aglobal3
-aglobal_o
boolean Show the full sequence in alignment Boolean value Yes/No N
General qualifiers
-auto boolean Turn off prompts Boolean value Yes/No N
-stdout boolean Write first file to standard output Boolean value Yes/No N
-filter boolean Read first file from standard input, write first file to standard output Boolean value Yes/No N
-options boolean Prompt for standard and additional values Boolean value Yes/No N
-debug boolean Write debug output to program.dbg Boolean value Yes/No N
-verbose boolean Report some/full command line options Boolean value Yes/No Y
-help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose Boolean value Yes/No N
-warning boolean Report warnings Boolean value Yes/No Y
-error boolean Report errors Boolean value Yes/No Y
-fatal boolean Report fatal errors Boolean value Yes/No Y
-die boolean Report dying program messages Boolean value Yes/No Y
-version boolean Report version number and exit Boolean value Yes/No N

Input file format

Alignment and sequence formats

Input and output of alignments and sequences is limited to the formats that the original hmmer supports. These include stockholm, SELEX, MSF, Clustal, Phylip and A2M /aligned FASTA (alignments) and FASTA, GENBANK, EMBL, GCG, PIR (sequences). It would be fairly straightforward to adapt the code to support all EMBOSS-supported formats.

Compressed input files

Automatic processing of gzipped files is not supported.

ehmmalign reads any normal sequence USAs.

Input files for usage example

File: ../ehmmcalibrate-ex-keep/globino.hmm

HMMER2.0  [2.3.2]
NAME  globins50
LENG  143
ALPH  Amino
RF    no
CS    no
MAP   yes
COM   /shared/software/bin/hmmbuild -n globins50 --pbswitch 1000 --archpri 0.850000 --idlevel 0.620000 --swentry 0.500000 --swexit 0.500000 --wgsc -A -F globin.hmm ../../data/hmmnew/globins50.msf
COM   /shared/software/bin/hmmcalibrate --mean 350.000000 --num 5000 --sd 350.000000 --seed 1 ../ehmmbuild-ex-keep/globin.hmm
NSEQ  50
DATE  Tue Jan 12 15:24:37 2010
CKSUM 9858
XT      -8455     -4  -1000  -1000  -8455     -4  -8455     -4 
NULT      -4  -8455
NULE     595  -1558     85    338   -294    453  -1158    197    249    902  -1085   -142    -21   -313     45    531    201    384  -1998   -644 
EVD   -35.959286   0.267496
HMM        A      C      D      E      F      G      H      I      K      L      M      N      P      Q      R      S      T      V      W      Y    
         m->m   m->i   m->d   i->m   i->i   d->m   d->d   b->m   m->e
         -450      *  -1900
     1    591  -1587    159   1351  -1874   -201    151  -1600    998  -1591   -693    389  -1272    595     42    -31     27   -693  -1797  -1134    14
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378   -450      * 
     2   -926  -2616   2221   2269  -2845  -1178   -325  -2678   -300  -2596  -1810    220  -1592    939   -974   -671   -939  -2204  -2785  -1925    15
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     3   -638  -1715   -680    497  -2043  -1540     23  -1671   2380  -1641   -840   -222  -1595    437   1040   -564   -523  -1363   2124  -1313    16
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     4    829  -1571    -37    660  -1856   -873    152  -1578    894  -1573   -678    769  -1273   1284     58    224    447  -1175  -1782  -1125    17
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     5    369   -433   -475    286   -974  -1312    -19   -412    664    398    406   1030  -1394    388   -214   -261     85   -166  -1227   -725    18
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     6  -1291   -884  -3696  -3261  -1137  -3425  -2802   2322  -3066    111     19  -3028  -3275  -2855  -3100  -2670  -1269   2738  -2450  -2062    19
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     7    157   -413   -236    316  -1387  -1231     89   -863   1084   -431   -348    910  -1319    635    297     15    704   -483  -1497   -922    20
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     8    770  -1431    -43    459  -1751   -340     78  -1449    440  -1497   -631    866  -1302    825    -51    953    364  -1076  -1750  -1121    21
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     9    420   -186  -2172  -1577      8  -1818   -694   1477  -1281    760    614  -1299  -1867  -1001  -1262   -189    -12   1401   -722   -364    22
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
    10   -961   -879  -2277  -1821   1366  -2213   -204   -399  -1500   -130    -39  -1427  -2266  -1186  -1511   -159   -913   -367   4721   1177    23
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
    11    -48  -1782    809    844  -2073   1456      8  -1811    315  -1803   -932    180  -1365    921   -218    173   -115  -1399  -2018  -1327    24


  [Part of this file has been deleted for brevity]

     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   128   -415  -1926   1575   1399  -2219  -1163     17  -1983    527  -1929  -1039    341  -1367   1597   -212    257   -222  -1536  -2109  -1387   144
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   129   -529  -1434   -629   -143  -1926   -626   -171  -1460   2679  -1597   -839   -309  -1599    207    317   -530   -510   -130  -1840  -1369   145
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   130    811   -397  -2389  -1807   1883  -2039   -907    594  -1512   1077    687  -1532  -2065  -1201  -1483  -1125   -465   1067   -843   -472   146
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   131   -241   -102  -2327  -1710    724  -1767   -616    650  -1363   1074   1765   -718  -1809  -1026  -1252   -842   -181   1331   -541    695   147
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   132    723     95    385    823  -1820  -1168    167  -1540    875  -1362   -644    320  -1261    810    246    693    -67  -1141  -1753  -1098   148
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   133    551   -430  -1049   -481   -442    469   -241    465   -313    133    947   -411  -1543    197   -587   -146    202    522   -843   -429   149
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   134  -1086   -777  -3351  -2800    816  -2898  -1861   1501  -2515   1149    586  -2483  -2775  -2108  -2400  -2046  -1030   2380  -1511  -1216   150
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   135   1393   1409   -876   -345   -997   -525   -315   -590   -198   -847   -109   -420  -1441    -97    412    766   -130    139  -1306   -858   151
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   136     98  -1299     36    365  -1495  -1211   1241   -404    523   -952   -426   1174  -1303    511    -18    347    882   -853  -1566   -970   152
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   137   1308   -787    564   -132   -966  -1332   -203   -362    -49   -395    -57   -305  -1481     49   -437   -190   -182   1020  -1282   -802   153
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   138  -1746  -1358  -3897  -3341   -216  -3621  -2478   1774  -3040   2442   1157  -3189  -3229  -2422  -2853  -2824  -1659    392  -1720  -1647   154
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   139   1176  -1289   -179    534  -1606   -607     34  -1278    734  -1372   -534     44  -1325    433    -89    521    826   -941  -1666  -1072   155
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   140    602  -1500   -135    850  -1753  -1214   1951  -1452    838  -1484    431    118  -1306    555    347    489   -153  -1085  -1723  -1092   156
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -22  -6602  -7644   -894  -1115   -701  -1378      *      * 
   141    351  -1646   -165    546  -1976   -498     46  -1667   2193  -1662   -798     35  -1405    476    311    -73   -306  -1287  -1859  -1254   157
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6561  -7603   -894  -1115   -701  -1378      *      * 
   142  -1995  -1606  -3095  -2870   1739  -3015    -98  -1012  -2520   -730    655  -1990  -2962  -1884  -2326  -2167  -1915  -1128    548   4089   158
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -25  -6455  -7497   -894  -1115   -701  -1378      *      * 
   143   -253  -1373   -267    301   -911   -565   1956   -450   1188  -1330   -497     33  -1352    502   1358   -205   -184   -941  -1604  -1026   159
     -      *      *      *      *      *      *      *      *      *      *      *      *      *      *      *      *      *      *      *      * 
     -      *      *      *      *      *      *      *      *      0 
//

File: globins630.fa

> BAHG_VITSP
MLDQQTINIIKATVPVLKEHGVTITTTFYKNLFAKHPEVRPLFDMGRQESLEQPKALAM
TVLAAAQNIENLPAILPAVKKIAVKHCQAGVAAAHYPIVGQELLGAIKEVLGDAATDDIL
DAWGKAYGVIADVfiqveadLYAQAVE
> GLB1_ANABR
PSVQGAAAQLTADVKKDLRDSWKVIGSDKKGNGVALMTTLFADNQETIGYFKRLGNVSQ
GMANDKLRGHSITLMYALQNFIDQLDNTDDLVCVVEKFAVNHITRKISAAEFGKINGPIK
KVLASKNFGDKYANAWAKLVAVVQAAL
> GLB1_ARTSX
ERVDPITGLSGLEKNAILDTWGKVRGNLQEVGKATFGKLFAAHPEYQQMFRFFQGVQLA
FLVQSPKFAAHTQRVVSALDQTLLALNRPSDQFVYMIKELGLDHINRGTDRSFVEYLKES
LGDSVDEFTVQSFGEVIVNFLNEGLRQA
> GLB1_CALSO
VSANDIKNVQDTWGKLYDQWDAVHAsKFYNKLFKDSEDISEAFVKAGTGSGIAMKRQAL
VFGAILQEFVANLNDPTALTLKIKGLCATHKTRGITNMELFAFALADLVAYMGTtISFTA
AQKASWTAVNDVILHQMSSYFATVA
> GLB1_CHITH
GPSGDQIAAAKASWNTVKNNQVDILYAVFKANPDIQTAFSQFAGKDLDSIKGTPDFSKH
AGRVVGLFSEVMDLLGNDANTPTILAKAKDFGKSHKSRASPAQLDNFRKSLVVYLKGATK
WDSAVESSWAPVLDFVFSTLKNEL
> GLB1_GLYDI
GLSAAQRQVIAATWKDIAGADNGAGVGKDCLIKFLSAHPQMAAVFGFSGASDPGVAALG
AKVLAQIGVAVSHLGDEGKMVAQMKAVGVRHKGYGNKHIKAQYFEPLGASLLSAMEHRIG
GKMNAAAKDAWAAAYADISGALISGLQS
> GLB1_LUMTE
ECLVTEGLKVKLQWASAFGHAHQRVAFGLELwkgILREHPEIKAPFSRVRGDNIYSPQF
GAHSQRVLSGLDITISMLDTPDmLAAQLAHLKVQHVERNLKPEFFDIFLKHLLHVLGDRL
GTHFDFGAWHDCVDQIIDGIKDI
> GLB1_MORMR
PIVDSGSVSPLSDAEKNKIRAAWDLVYKDYEKTGVDILVKFFTGTPAAQAFFPKFKGLT
TADDLKQSSDVRWHAERIINAVNDAVKSMDDTEKMSMKLKELSIKHAQSFYVDRQYFKVL
AGIIADTTAPGDAGFEKLMSMICILLSSAY
> GLB1_PARCH
GGTLAIQSHGDLTLAQKKIVRKTWHQLMRNKTSFVTDLFIRIFAYDPAAQNKFPQMAGM
SASQLRSSRQMQAHAIRVSSIMSEYIEELDSDILPELLATLARTHDLNKVGPAHYDLFAK
VLMEALQAELGSDFNQKTRDSWAKAFSIVQAVLLVKHG
> GLB1_PETMA
PIVDSGSVPALTAAEKATIRTAWAPVYAKYQSTGVDILIKFFTSNPAAQAFFPKFQGLT
SADQLKKSMDVRWHAERIINAVNDAVVAMDDTEKMSLKLRELSGKHAKSFQVDPQYFKVL
AAVIVDTVLPGDAGLEKLMSMICILLRSSY
> GLB1_PHESE
DCNTLKRFKVKHQWQQVFSGEhHRTEFSLHFWKEFLHDHPDLVSLFKRVQGENIYSPEF
QAHGIRVLAGLDSVIGVLDEDDTFTVQLAHLKAQHTERGTKPEYFDLFGTQLFDILGDKL
GTHFDQAAWRDCYAVIAAGIKP
> GLB1_SCAIN
PSVYDAAAQLTADVKKDLRDSWKVIGSDKKGNGVALMTTLFADNQETIGYFKRLGNVSQ
GMANDKLRGHSITLMYALQNFIDQLDNPDDLVCVVEKFAVNHITRKISAAEFGKINGPIK
KVLASKNFGDKYANAWAKLVAVVQAAL
> GLB1_TYLHE
TDCGILQRIKVKQQWAQVYSVGESRTDFAIDVFNNFFRTNPDRSLFNRVNGDNVYSPEF


  [Part of this file has been deleted for brevity]

GLSDAEWQLVLNVWGKVEADLAGHGQEVLIRLFHTHPETLEKFDKFKHLKSEDEMKASE
DLKKHGNTVLTALGAILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISEAIIHVLHSKH
PGDFGADAQAAMSKALELFRNDIAAQYKELGFQG
> MYG_ROUAE
GLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE
DLKKHGATVLTALGGILKKKGQHEAQLKPLAQSHATKHKIPVKYLEFISEVIIQVLQSKH
PGDFGADAQGAMGKALELFRNDIAAKYKELGFQG
> MYG_SAISC
GLSDGEWQLVLNIWGKVEADIPSHGQEVLISLFKGHPETLEKFDKFKHLKSEDEMKASE
ELKKHGTTVLTALGGILKKKGQHEAELKPLAQSHATKHKIPVKYLELISDAIVHVLQKKH
PGDFGADAQGAMKKALELFRNDMAAKYKELGFQG
> MYG_SHEEP
GLSDGEWQLVLNAWGKVEADVAGHGQEVLIRLFTGHPETLEKFDKFKHLKTEAEMKASE
DLKKHGNTVLTALGGILKKKGHHEAEVKHLAESHANKHKIPVKYLEFISDAIIHVLHAKH
PSNFGADAQGAMSKALELFRNDMAAEYKVLGFQG
> MYG_SPAEH
GLSDGEWQLVLNVWGKVEGDLAGHGQEVLIKLFKNHPETLEKFDKFKHLKSEDEMKGSE
DLKKHGNTVLTALGGILKKKGQHAAEIQPLAQSHATKHKIPIKYLEFISEAIIQVLQSKH
PGDFGADAQGAMSKALELFRNDIAAKYKELGFQG
> MYG_TACAC
GLSDGEWQLVLKVWGKVETDITGHGQDVLIRLFKTHPETLEKFDKFKHLKTEDEMKASA
DLKKHGGVVLTALGSILKKKGQHEAELKPLAQSHATKHKISIKFLEFISEAIIHVLQSKH
SADFGADAQAAMGKALELFRNDMATKYKEFGFQG
> MYG_THUAL
ADFDAVLKCWGPVEADYTTMGGLVLTRLFKEHPETQKLFPKFAGIAQADIAGNAAISAH
GATVLKKLGELLKAKGSHAAILKPLANSHATKHKIPINNFKLISEVLVKVMHEKAGLDAG
GQTALRNVMGIIIADLEANYKELGFSG
> MYG_TUPGL
GLSDGEWQLVLNVWGKVEADVAGHGQEVLIRLFKGHPETLEKFDKFKHLKTEDEMKASE
DLKKHGNTVLSALGGILKKKGQHEAEIKPLAQSHATKHKIPVKYLEFISEAIIQVLQSKH
PGDFGADAQAAMSKALELFRNDIAAKYKELGFQG
> MYG_TURTR
GLSDGEWQLVLNVWGKVEADLAGHGQDVLIRLFKGHPETLEKFDKFKHLKTEADMKASE
DLKKHGNTVLTALGAILKKKGHHDAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH
PAEFGADAQGAMNKALELFRKDIAAKYKELGFHG
> MYG_VARVA
GLSDEEWKKVVDIWGKVEPDLPSHGQEVIIRMFQNHPETQDRFAKFKNLKTLDEMKNSE
DLKKHGTTVLTALGRILKQKGHHEAEIAPLAQTHANTHKIPIKYLEFICEVIVGVIAEKH
SADFGADSQEAMRKALELFRNDMASRYKELGFQG
> MYG_VULCH
GLSDGEWQLVLNIWGKVETDLAGHGQEVLIRLFKNHPETLDKFDKFKHLKTEDEMKGSE
DLKKHGNTVLTALGGILKKKGHHEAELKPLAQSHATKHKIPVKYLEFISDAIIQVLQSKH
SGDFHADTEAAMKKALELFRNDIAAKYKELGFQG
> MYG_ZALCA
GLSDGEWQLVLNIWGKVEADLVGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKRSE
DLKKHGKTVLTALGGILKKKGHHDAELKPLAQSHATKHKIPIKYLEFISEAIIHVLQSKH
PGDFGADTHAAMKKALELFRNDIAAKYRELGFQG
> MYG_ZIPCA
GLSEAEWQLVLHVWAKVEADLSGHGQEILIRLFKGHPETLEKFDKFKHLKSEAEMKASE
DLKKHGHTVLTALGGILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISDAIIHVLHSRH
PSDFGADAQAAMTKALELFRKDIAAKYKELGFHG

Output file format

ehmmalign outputs a graph to the specified graphics device. outputs a report format file. The default format is ...

Output files for usage example

File: globins630.ali

>BAHG_VITSP 
.................mldqQTINIIKATV.PV...L.....K...E...H...GV.T.
....I...TTTFYKN..LFAKHPEVRPLFDMGRQESL-------..EQPKALAMTVLAAA
QN-IENLPA......ILPAVKKIAVKH..CQAG-..VAAAHYPIV.GQELLGAIKEV.L.
G.D.AATDDILDAWGKAYGVIADVFIQVEAdlyaqave
>GLB1_ANABR 
.........psvqgaaaqltaDVKKDLRDSW.KV...I.....G...S...D...KK.G.
....N...GVALMTT..LFADNQETIGYFKRLGNVSQ---GMAN..DKLRGHSITLMYAL
QNFIDQLDNt...ddLVCVVEKFAVNH..ITR-K..ISAAEFGKI.NGPIKKVLAS-.-.
-.K.NFGDKYANAWAKLVAVVQAAL-----........
>GLB1_ARTSX 
..........ervdpitglsgLEKNAILDTW.GK...V.....R...G...N...LQ.E.
....V...GKATFGK..LFAAHPEYQQMFRFFQGVQL-AFLVQS..PKFAAHTQRVVSAL
DQTLLALNRps..dqFVYMIKELGLDH..INRGT..-DRSFVEYL.KESL-----GD.S.
V.D.EFT------VQSFGEVIVNFLNEGLRqa......
>GLB1_CALSO 
..................vsaNDIKNVQDTW.GK...L.....Y...D...Q...WD.A.
....V..hASKFYNK..LFKDSEDISEAFVKAGT---GSGIAMK..RQALVFGA-ILQEF
VANLNDPTA......LTLKIKGLCATH..KTRGI..TNMELFAFA.LADLVAYMGTT.I.
S.-.-FTAAQKASWTAVNDVILHQMSSYFAtva.....
>GLB1_CHITH 
.................gpsgDQIAAAKASW.NT...V.....-...-...-...-K.N.
....N...QVDILYA..VFKANPDIQTAFSQFAG-KDLDSIKGT..PDFSKHAGRVVGLF
SEVMDLLGNdantptILAKAKDFGKSH..--KSR..ASPAQLDNF.RKSLVVYLKGA.-.
-.T.KWDSAVESSWAPVLDFVFSTLKNEL-........
>GLB1_GLYDI 
.................glsaAQRQVIAATW.KD...I.....A...Ga.dN...GA.G.
....V...GKDCLIK..FLSAHPQMAAVFGFSGA--------SD..PGVAALGAKVLAQI
GVAVSHLGDe...gkMVAQMKAVGVRHkgYGNKH..IKAQYFEPL.GASLLSAMEHR.I.
G.G.KMNAAAKDAWAAAYADISGALISGLQs.......
>GLB1_LUMTE 
.............eclvteglKVKLQWASAF.GH...A.....-...H...Q...RV.A.
....F...GLELWKG..ILREHPEIKAPFSRVRG-DN----IYS..PQFGAHSQRVLSGL
DITISMLDTp...dmLAAQLAHLKVQH..VER-N..LKPEFFDIF.LKHLLHVLGDR.L.
G.T.HFDF---GAWHDCVDQIIDGIKD--I........
>GLB1_MORMR 
........pivdsgsvsplsdAEKNKIRAAW.DL...V.....Y...K...D...YE.K.
....T...GVDILVK..FFTGTPAAQAFFPKFKGLTTADDLKQS..SDVRWHAERIINAV
NDAVKSMDDt...ekMSMKLKELSIKH..AQSFY..VDRQYFKVL.AGII-------.-.
-.A.DTTAPGDAGFEKLMSMICILLSSAY-........
>GLB1_PARCH 
.......ggtlaiqshgdltlAQKKIVRKTW.HQ...L.....M...R...N...KT.S.
....F...VTDLFIR..IFAYDPAAQNKFPQMAGMSA-SQLRSS..RQMQAHAIRVSSIM
SEYIEELDSd....iLPELLATLARTH..-DLNK..VGPAHYDLF.AKVLMEALQAE.L.
G.S.DFNQKTRDSWAKAFSIVQAVLLVKHG........
>GLB1_PETMA 
........pivdsgsvpaltaAEKATIRTAW.AP...V.....Y...A...K...YQ.S.
....T...GVDILIK..FFTSNPAAQAFFPKFQGLTSADQLKKS..MDVRWHAERIINAV
NDAVVAMDDt...ekMSLKLRELSGKH..AKSFQ..VDPQYFKVL.AAVI-------.-.
-.V.DTVLPGDAGLEKLMSMICILLRSSY-........


  [Part of this file has been deleted for brevity]

P.G.DFGADAQGAMKKALELFRNDMAAKYKelgfqg..
>MYG_SHEEP 
.................glsdGEWQLVLNAW.GK...V.....E...A...D...VA.G.
....H...GQEVLIR..LFTGHPETLEKFDKFKHLKTEAEMKAS..EDLKKHGNTVLTAL
GGILKKKGH......HEAEVKHLAESH..ANKHK..IPVKYLEFI.SDAIIHVLHAK.H.
P.S.NFGADAQGAMSKALELFRNDMAAEYKvlgfqg..
>MYG_SPAEH 
.................glsdGEWQLVLNVW.GK...V.....E...G...D...LA.G.
....H...GQEVLIK..LFKNHPETLEKFDKFKHLKSEDEMKGS..EDLKKHGNTVLTAL
GGILKKKGQ......HAAEIQPLAQSH..ATKHK..IPIKYLEFI.SEAIIQVLQSK.H.
P.G.DFGADAQGAMSKALELFRNDIAAKYKelgfqg..
>MYG_TACAC 
.................glsdGEWQLVLKVW.GK...V.....E...T...D...IT.G.
....H...GQDVLIR..LFKTHPETLEKFDKFKHLKTEDEMKAS..ADLKKHGGVVLTAL
GSILKKKGQ......HEAELKPLAQSH..ATKHK..ISIKFLEFI.SEAIIHVLQSK.H.
S.A.DFGADAQAAMGKALELFRNDMATKYKefgfqg..
>MYG_THUAL 
.....................ADFDAVLKCW.GP...V.....E...A...D...YT.T.
....M...GGLVLTR..LFKEHPETQKLFPKFAGIAQ-ADIAGN..AAISAHGATVLKKL
GELLKAKGS......HAAILKPLANSH..ATKHK..IPINNFKLI.SEVLVKVMHEK.A.
G.-.-LDAGGQTALRNVMGIIIADLEANYKelgfsg..
>MYG_TUPGL 
.................glsdGEWQLVLNVW.GK...V.....E...A...D...VA.G.
....H...GQEVLIR..LFKGHPETLEKFDKFKHLKTEDEMKAS..EDLKKHGNTVLSAL
GGILKKKGQ......HEAEIKPLAQSH..ATKHK..IPVKYLEFI.SEAIIQVLQSK.H.
P.G.DFGADAQAAMSKALELFRNDIAAKYKelgfqg..
>MYG_TURTR 
.................glsdGEWQLVLNVW.GK...V.....E...A...D...LA.G.
....H...GQDVLIR..LFKGHPETLEKFDKFKHLKTEADMKAS..EDLKKHGNTVLTAL
GAILKKKGH......HDAELKPLAQSH..ATKHK..IPIKYLEFI.SEAIIHVLHSR.H.
P.A.EFGADAQGAMNKALELFRKDIAAKYKelgfhg..
>MYG_VARVA 
.................glsdEEWKKVVDIW.GK...V.....E...P...D...LP.S.
....H...GQEVIIR..MFQNHPETQDRFAKFKNLKTLDEMKNS..EDLKKHGTTVLTAL
GRILKQKGH......HEAEIAPLAQTH..ANTHK..IPIKYLEFI.CEVIVGVIAEK.H.
S.A.DFGADSQEAMRKALELFRNDMASRYKelgfqg..
>MYG_VULCH 
.................glsdGEWQLVLNIW.GK...V.....E...T...D...LA.G.
....H...GQEVLIR..LFKNHPETLDKFDKFKHLKTEDEMKGS..EDLKKHGNTVLTAL
GGILKKKGH......HEAELKPLAQSH..ATKHK..IPVKYLEFI.SDAIIQVLQSK.H.
S.G.DFHADTEAAMKKALELFRNDIAAKYKelgfqg..
>MYG_ZALCA 
.................glsdGEWQLVLNIW.GK...V.....E...A...D...LV.G.
....H...GQEVLIR..LFKGHPETLEKFDKFKHLKSEDEMKRS..EDLKKHGKTVLTAL
GGILKKKGH......HDAELKPLAQSH..ATKHK..IPIKYLEFI.SEAIIHVLQSK.H.
P.G.DFGADTHAAMKKALELFRNDIAAKYRelgfqg..
>MYG_ZIPCA 
.................glseAEWQLVLHVW.AK...V.....E...A...D...LS.G.
....H...GQEILIR..LFKGHPETLEKFDKFKHLKSEAEMKAS..EDLKKHGHTVLTAL
GGILKKKGH......HEAELKPLAQSH..ATKHK..IPIKYLEFI.SDAIIHVLHSR.H.
P.S.DFGADAQAAMTKALELFRKDIAAKYKelgfhg..

Data files

None.

Notes

1. Command-line arguments

The following original HMMER options are not supported:
-h         : Use -help to get help information instead.
-informat  : All common sequence file formats are supported automatically.
-oneline   : Alignment format output is specified via the ACD file and / 
             or the -aformat command line qualifier.
-outformat : Alignment format output is specified via the ACD file and / or 
             the -aformat command line qualifier.

2. Installing EMBASSY HMMER

The EMBASSY HMMER package contains "wrapper" applications providing an EMBOSS-style interface to the applications in the original HMMER package version 2.3.2 developed by Sean Eddy. Please read the file INSTALL in the EMBASSY HMMER package distribution for installation instructions.

3. Installing original HMMER

To use EMBASSY HMMER, you will first need to download and install the original HMMER package. Please read the file 00README in the the original HMMER package distribution for installation instructions:
WWW home:       http://hmmer.wustl.edu/
Distribution:   ftp://ftp.genetics.wustl.edu/pub/eddy/hmmer/

4. Setting up HMMER

For the EMBASSY HMMER package to work, the directory containing the original HMMER executables *must* be in your path. For example if you executables were installed to "/usr/local/hmmer/bin", then type:
set path=(/usr/local/hmmer/bin/ $path)
rehash

5. Getting help

Please read the Userguide.pdf distributed with the original HMMER and included in the EMBASSY HMMER distribution under the DOCS directory. The first 3 chapters (Introduction, Installation and Tutorial) are particularly useful.

Please read the 'Notes' section below for a description of the differences between the original and EMBASSY HMMER, particularly which application command line options are supported.

References

None.

Warnings

Types of input data

hmmer v3.2.1 and therefore EMBASSY HMMER is only recommended for use with protein sequences. If you provide a non-protein sequence you will be reprompted for a protein sequence. To accept nucleic acid sequences you must replace instances of < type: "protein" > in the application ACD files with .

Environment variables

The original hmmer uses BLAST environment variables (below), if defined, to locate files. The EMBASSY HMMER does not.
BLASTDB   location of sequence databases to be searched
BLASMAT   location of substitution matrices
HMMERDB   location of HMMs

Sequence input

ehmmalign makes a temporary local copy of its input sequence data. You must ensure there is sufficient disk space for this in the directory that ehmmalign is run.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

See also

Program name Description
ehmmbuild Build a profile HMM from an alignment
ehmmcalibrate Calibrate HMM search statistics
ehmmconvert Convert between profile HMM file formats
ehmmemit Generate sequences from a profile HMM
ehmmfetch Retrieve an HMM from an HMM database
ehmmindex Create a binary SSI index for an HMM database
ehmmpfam Search one or more sequences against an HMM database
ehmmsearch Search a sequence database with a profile HMM
oalistat Statistics for multiple alignment files
ohmmalign Align sequences with an HMM
ohmmbuild Build HMM
ohmmcalibrate Calibrate a hidden Markov model
ohmmconvert Convert between HMM formats
ohmmemit Extract HMM sequences
ohmmfetch Extract HMM from a database
ohmmindex Index an HMM database
ohmmpfam Align single sequence with an HMM
ohmmsearch Search sequence database with an HMM

Author(s)

This program is an EMBOSS conversion of a program written by Sean Eddy as part of his HMMER package.

Although we take every care to ensure that the results of the EMBOSS version are identical to those from the original package, we recommend that you check your inputs give the same results in both versions before publication.

Please report all bugs in the EMBOSS version to the EMBOSS bug team, not to the original author. Jon Ison (jison © ebi.ac.uk)
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

This program is an EMBASSY wrapper to a program written by Sean Eddy as part of his hmmer package.

Please report any bugs to the EMBOSS bug team in the first instance, not to Sean Eddy.

History

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.