ehmmsearch

 

Wiki

The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.

Please help by correcting and extending the Wiki pages.

Function

Search a sequence database with a profile HMM

Description

EMBASSY HMMER is a suite of application wrappers to the original hmmer v2.3.2 applications written by Sean Eddy. hmmer v2.3.2 must be installed on the same system as EMBOSS and the location of the hmmer executables must be defined in your path for EMBASSY HMMER to work.

Usage:
hmmsearch [options] hmmfile seqfile outfile

The outfile parameter is new to EMBASSY HMMER.

hmmsearch reads an HMM from and searches for signifcantly similar sequence matches. seqfile will be looked for first in the current working directory, then in a directory named by the environment variable BLASTDB. This lets users use existing BLAST databases, if BLAST has been confgured for the site. hmmsearch may take minutes or even hours to run, depending on the size of the sequence database. The output is written to a file called . The output consists of four sections: a ranked list of the best scoring sequences, a ranked list of the best scoring domains, alignments for all the best scoring domains, and a histogram of the scores. A sequencescore may be higher than a domain score for the same sequence if there is more than one domain in the sequence; the sequence score takes into account all the domains. All sequences scoring above the -E and -T cutoffs are shown in the frst list, then every domain found in this list is shown in the second list of domain hits. If desired, E-value and bit score thresholds may also be applied to the domain list using the -domE and -domT options.

Algorithm

Please read the Userguide.pdf distributed with the original HMMER and included in the EMBASSY HMMER distribution under the DOCS directory.

Usage

Here is a sample session with ehmmsearch


% ehmmsearch ../ehmmcalibrate-ex-keep/globino.hmm Artemia.fa globino.ehmmsearch -a 100 
Search a sequence database with a profile HMM


hmmsearch -A 100 -E 10.000000 -T -1000000.000000 -Z 0 --domE 1000000.000000 --domT -1000000.000000  ../ehmmcalibrate-ex-keep/globino.hmm ../../data/hmmnew/Artemia.fa > globino.ehmmsearch


Go to the input files for this example
Go to the output files for this example

Command line arguments

Where possible, the same command-line qualifier names and parameter order is used as in the original hmmer. There are however several unavoidable differences and these are clearly documented in the "Notes" section below.

More or less all options documented as "expert" in the original hmmer user guide are given in ACD as "advanced" options (-options must be specified on the command-line in order to be prompted for a value for them).

   Standard (Mandatory) qualifiers:
  [-hmmfile]           infile     File of HMMs.
  [-seqfile]           seqall     File of input sequences.
   -a                  integer    [100] Limits the alignment output to the 
                                  best scoring domains. -A0 shuts off the
                                  alignment output and can be used to reduce
                                  the size of output files. (Any integer
                                  value)
  [-outfile]           outfile    [*.ehmmsearch] The output consists of four
                                  sections: a ranked list of the best scoring
                                  sequences, a ranked list of the best scoring
                                  domains, alignments for all the best
                                  scoring domains, and a histogram of the
                                  scores. A sequence score may be higher than
                                  a domain score for the same sequence if
                                  there is more than one domain in the
                                  sequence; the sequence score takes into
                                  account all the domains. All sequences
                                  scoring above the -E and -T cutoffs are
                                  shown in the frst list, then every domain
                                  found in this list is shown in the second
                                  list of domain hits. If desired, E-value and
                                  bit score thresholds may also be applied to
                                  the domain list using the -domE and -domT
                                  options.

   Additional (Optional) qualifiers:
   -e                  float      [10.] Set the E-value cutoff for the
                                  per-sequence ranked hit list to , where
                                   is a positive real number. The default
                                  is 10.0. Hits with E-values better than
                                  (less than) this threshold will be shown.
                                  (Any numeric value)
   -t                  float      [-1000000.] Set the bit score cutoff for the
                                  per-sequence ranked hit list to , where
                                   is a real number. The default is
                                  negative infinity; by default, the threshold
                                  is controlled by E-value and not by bit
                                  score. Hits with bit scores better than
                                  (greater than) this threshold will be shown.
                                  (Any numeric value)

   Advanced (Unprompted) qualifiers:
   -z                  integer    [0] Calculate the E-value scores as if we
                                  had seen a sequence database of 
                                  sequences. The default is the number of
                                  sequences seen in your database file
                                  . (Any integer value)
   -compat             boolean    [N] Use the output format of HMMER 2.1.1,
                                  the 1998-2001 public release; provided so
                                  2.1.1 parsers don't have to be rewritten.
   -cpu                integer    [0] Sets the maximum number of CPUs that the
                                  program will run on. The default is to use
                                  all CPUs in the machine. Overrides the HMMER
                                  NCPU environment variable. Only affects
                                  threaded versions of HMMER (the default on
                                  most systems). (Any integer value)
   -cutga              boolean    [N] Use Pfam GA (gathering threshold) score
                                  cutoffs. Equivalent to -globT  -domT
                                  , but the GA1 and GA2 cutoffs are read
                                  from each HMM in the input HMM database
                                  individually. hmmbuild puts these cutoffs
                                  there if the alignment file was annotated in
                                  a Pfam-friendly alignment format (extended
                                  SELEX or Stockholm format) and the optional
                                  GA annotation line was present. If these
                                  cutoffs are not set in the HMM file, -cut ga
                                  doesn't work.
   -cuttc              boolean    [N] Use Pfam TC (trusted cutoff) score
                                  cutoffs. Equivalent to -globT  -domT
                                  , but the TC1 and TC2 cutoffs are read
                                  from each HMM in hmmfile individually.
                                  hmmbuild puts these cutoffs there if the
                                  alignment file was annotated in a
                                  Pfam-friendly alignment format (extended
                                  SELEX or Stockholm format) and the optional
                                  TC annotation line was present. If these
                                  cutoffs are not set in the HMM file, -cut tc
                                  doesn't work.
   -cutnc              boolean    [N] Use Pfam NC (noise cutoff) score
                                  cutoffs. Equivalent to -globT  -domT
                                  , but the NC1 and NC2 cutoffs are read
                                  from each HMM in hmmfile individually.
                                  hmmbuild puts these cutoffs there if the
                                  alignment file was annotated in a
                                  Pfam-friendly alignment format (extended
                                  SELEX or Stockholm format) and the optional
                                  NC annotation line was present. If these
                                  cutoffs are not set in the HMM file, -cut nc
                                  doesn't work.
   -dome               float      [1000000.] Set the E-value cutoff for the
                                  per-domain ranked hit list to , where 
                                  is a positive real number. The default is
                                  infinity; by default, all domains in the
                                  sequences that passed the frst threshold
                                  will be reported in the second list, so that
                                  the number of domains reported in the
                                  per-sequence list is consistent with the
                                  number that appear in the per-domain list.
                                  (Any numeric value)
   -domt               float      [-1000000.] Set the bit score cutoff for the
                                  per-domain ranked hit list to , where
                                   is a real number. The default is
                                  negative infinity; by default, all domains
                                  in the sequences that passed the frst
                                  threshold will be reported in the second
                                  list, so that the number of domains reported
                                  in the per-sequence list is consistent with
                                  the number that appear in the per-domain
                                  list. Important note: only one domain in a
                                  sequence is absolutely controlled by this
                                  parameter, or by -domT. The second and
                                  subsequent domains in a sequence have a de
                                  facto bit score threshold of 0 because of
                                  the details of how HMMER works. HMMER
                                  requires at least one pass through the main
                                  model per sequence; to do more than one pass
                                  (more than one domain) the multidomain
                                  alignment must have a better score than the
                                  single domain alignment, and hence the extra
                                  domains must contribute positive score. See
                                  the Users' Guide for more detail. (Any
                                  numeric value)
   -forward            boolean    [N] Use the Forward algorithm instead of the
                                  Viterbi algorithm to determine the
                                  per-sequence scores. Per-domain scores are
                                  still determined by the Viterbi algorithm.
                                  Some have argued that Forward is a more
                                  sensitive algorithm for detecting remote
                                  sequence homologues; my experiments with
                                  HMMER have not confrmed this, however.
   -nulltwo            boolean    [N] Turn off the post hoc second null model.
                                  By default, each alignment is rescored by a
                                  postprocessing step that takes into account
                                  possible biased composition in either the
                                  HMM or the target sequence. This is almost
                                  essential in database searches, especially
                                  with local alignment models. There is a very
                                  small chance that this postprocessing might
                                  remove real matches, and in these cases
                                  --null2 may improve sensitivity at the
                                  expense of reducing specifcity by letting
                                  biased composition hits through.
   -pvm                boolean    [N] Run on a Parallel Virtual Machine (PVM).
                                  The PVM must already be running. The client
                                  program hmmpfam-pvm must be installed on
                                  all the PVM nodes. The HMM database hmmfile
                                  and an associated GSI index file hmmfile.gsi
                                  must also be installed on all the PVM
                                  nodes. (The GSI index is produced by the
                                  program hmmindex.) Because the PVM
                                  implementation is I/O bound, it is highly
                                  recommended that each node have a local copy
                                  of hmmfile rather than NFS mounting a
                                  shared copy. Optional PVM support must have
                                  been compiled into HMMER for -pvm to
                                  function.
   -xnu                boolean    [N] Turn on XNU filtering of target protein
                                  sequences. Has no effect on nucleic acid
                                  sequences. In trial experiments, -xnu
                                  appears to perform less well than the
                                  default post hoc null2 model.

   Associated qualifiers:

   "-seqfile" associated qualifiers
   -sbegin2            integer    Start of each sequence to be used
   -send2              integer    End of each sequence to be used
   -sreverse2          boolean    Reverse (if DNA)
   -sask2              boolean    Ask for begin/end/reverse
   -snucleotide2       boolean    Sequence is nucleotide
   -sprotein2          boolean    Sequence is protein
   -slower2            boolean    Make lower case
   -supper2            boolean    Make upper case
   -sformat2           string     Input sequence format
   -sdbname2           string     Database name
   -sid2               string     Entryname
   -ufo2               string     UFO features
   -fformat2           string     Features format
   -fopenfile2         string     Features file name

   "-outfile" associated qualifiers
   -odirectory3        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages

Standard (Mandatory) qualifiers Allowed values Default
[-hmmfile]
(Parameter 1)
File of HMMs. Input file Required
[-seqfile]
(Parameter 2)
File of input sequences. Readable sequence(s) Required
-a Limits the alignment output to the <n> best scoring domains. -A0 shuts off the alignment output and can be used to reduce the size of output files. Any integer value 100
[-outfile]
(Parameter 3)
The output consists of four sections: a ranked list of the best scoring sequences, a ranked list of the best scoring domains, alignments for all the best scoring domains, and a histogram of the scores. A sequence score may be higher than a domain score for the same sequence if there is more than one domain in the sequence; the sequence score takes into account all the domains. All sequences scoring above the -E and -T cutoffs are shown in the frst list, then every domain found in this list is shown in the second list of domain hits. If desired, E-value and bit score thresholds may also be applied to the domain list using the -domE and -domT options. Output file <*>.ehmmsearch
Additional (Optional) qualifiers Allowed values Default
-e Set the E-value cutoff for the per-sequence ranked hit list to <x>, where <x> is a positive real number. The default is 10.0. Hits with E-values better than (less than) this threshold will be shown. Any numeric value 10.
-t Set the bit score cutoff for the per-sequence ranked hit list to <x>, where <x> is a real number. The default is negative infinity; by default, the threshold is controlled by E-value and not by bit score. Hits with bit scores better than (greater than) this threshold will be shown. Any numeric value -1000000.
Advanced (Unprompted) qualifiers Allowed values Default
-z Calculate the E-value scores as if we had seen a sequence database of <n> sequences. The default is the number of sequences seen in your database file <seqfile>. Any integer value 0
-compat Use the output format of HMMER 2.1.1, the 1998-2001 public release; provided so 2.1.1 parsers don't have to be rewritten. Boolean value Yes/No No
-cpu Sets the maximum number of CPUs that the program will run on. The default is to use all CPUs in the machine. Overrides the HMMER NCPU environment variable. Only affects threaded versions of HMMER (the default on most systems). Any integer value 0
-cutga Use Pfam GA (gathering threshold) score cutoffs. Equivalent to -globT <GA1> -domT <GA2>, but the GA1 and GA2 cutoffs are read from each HMM in the input HMM database individually. hmmbuild puts these cutoffs there if the alignment file was annotated in a Pfam-friendly alignment format (extended SELEX or Stockholm format) and the optional GA annotation line was present. If these cutoffs are not set in the HMM file, -cut ga doesn't work. Boolean value Yes/No No
-cuttc Use Pfam TC (trusted cutoff) score cutoffs. Equivalent to -globT <TC1> -domT <TC2>, but the TC1 and TC2 cutoffs are read from each HMM in hmmfile individually. hmmbuild puts these cutoffs there if the alignment file was annotated in a Pfam-friendly alignment format (extended SELEX or Stockholm format) and the optional TC annotation line was present. If these cutoffs are not set in the HMM file, -cut tc doesn't work. Boolean value Yes/No No
-cutnc Use Pfam NC (noise cutoff) score cutoffs. Equivalent to -globT <NC1> -domT <NC2>, but the NC1 and NC2 cutoffs are read from each HMM in hmmfile individually. hmmbuild puts these cutoffs there if the alignment file was annotated in a Pfam-friendly alignment format (extended SELEX or Stockholm format) and the optional NC annotation line was present. If these cutoffs are not set in the HMM file, -cut nc doesn't work. Boolean value Yes/No No
-dome Set the E-value cutoff for the per-domain ranked hit list to <x>, where <x> is a positive real number. The default is infinity; by default, all domains in the sequences that passed the frst threshold will be reported in the second list, so that the number of domains reported in the per-sequence list is consistent with the number that appear in the per-domain list. Any numeric value 1000000.
-domt Set the bit score cutoff for the per-domain ranked hit list to <x>, where <x> is a real number. The default is negative infinity; by default, all domains in the sequences that passed the frst threshold will be reported in the second list, so that the number of domains reported in the per-sequence list is consistent with the number that appear in the per-domain list. Important note: only one domain in a sequence is absolutely controlled by this parameter, or by -domT. The second and subsequent domains in a sequence have a de facto bit score threshold of 0 because of the details of how HMMER works. HMMER requires at least one pass through the main model per sequence; to do more than one pass (more than one domain) the multidomain alignment must have a better score than the single domain alignment, and hence the extra domains must contribute positive score. See the Users' Guide for more detail. Any numeric value -1000000.
-forward Use the Forward algorithm instead of the Viterbi algorithm to determine the per-sequence scores. Per-domain scores are still determined by the Viterbi algorithm. Some have argued that Forward is a more sensitive algorithm for detecting remote sequence homologues; my experiments with HMMER have not confrmed this, however. Boolean value Yes/No No
-nulltwo Turn off the post hoc second null model. By default, each alignment is rescored by a postprocessing step that takes into account possible biased composition in either the HMM or the target sequence. This is almost essential in database searches, especially with local alignment models. There is a very small chance that this postprocessing might remove real matches, and in these cases --null2 may improve sensitivity at the expense of reducing specifcity by letting biased composition hits through. Boolean value Yes/No No
-pvm Run on a Parallel Virtual Machine (PVM). The PVM must already be running. The client program hmmpfam-pvm must be installed on all the PVM nodes. The HMM database hmmfile and an associated GSI index file hmmfile.gsi must also be installed on all the PVM nodes. (The GSI index is produced by the program hmmindex.) Because the PVM implementation is I/O bound, it is highly recommended that each node have a local copy of hmmfile rather than NFS mounting a shared copy. Optional PVM support must have been compiled into HMMER for -pvm to function. Boolean value Yes/No No
-xnu Turn on XNU filtering of target protein sequences. Has no effect on nucleic acid sequences. In trial experiments, -xnu appears to perform less well than the default post hoc null2 model. Boolean value Yes/No No

Input file format

Alignment and sequence formats

Input and output of alignments and sequences is limited to the formats that the original hmmer supports. These include stockholm, SELEX, MSF, Clustal, Phylip and A2M /aligned FASTA (alignments) and FASTA, GENBANK, EMBL, GCG, PIR (sequences). It would be fairly straightforward to adapt the code to support all EMBOSS-supported formats.

Compressed input files

Automatic processing of gzipped files is not supported.

ehmmsearch reads any normal sequence USAs.

Input files for usage example

File: ../ehmmcalibrate-ex-keep/globino.hmm

HMMER2.0  [2.3.2]
NAME  globins50
LENG  143
ALPH  Amino
RF    no
CS    no
MAP   yes
COM   hmmbuild -n globins50 --pbswitch 1000 --archpri 0.850000 --idlevel 0.620000 --swentry 0.500000 --swexit 0.500000 --wgsc -A -F globin.hmm ../../data/hmmnew/globins50.msf
COM   hmmcalibrate --mean 350.000000 --num 5000 --sd 350.000000 --seed 1 ../ehmmbuild-ex-keep/globin.hmm
NSEQ  50
DATE  Tue Jul 15 12:00:00 2008
CKSUM 9858
XT      -8455     -4  -1000  -1000  -8455     -4  -8455     -4 
NULT      -4  -8455
NULE     595  -1558     85    338   -294    453  -1158    197    249    902  -1085   -142    -21   -313     45    531    201    384  -1998   -644 
EVD   -35.959286   0.267496
HMM        A      C      D      E      F      G      H      I      K      L      M      N      P      Q      R      S      T      V      W      Y    
         m->m   m->i   m->d   i->m   i->i   d->m   d->d   b->m   m->e
         -450      *  -1900
     1    591  -1587    159   1351  -1874   -201    151  -1600    998  -1591   -693    389  -1272    595     42    -31     27   -693  -1797  -1134    14
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378   -450      * 
     2   -926  -2616   2221   2269  -2845  -1178   -325  -2678   -300  -2596  -1810    220  -1592    939   -974   -671   -939  -2204  -2785  -1925    15
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     3   -638  -1715   -680    497  -2043  -1540     23  -1671   2380  -1641   -840   -222  -1595    437   1040   -564   -523  -1363   2124  -1313    16
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     4    829  -1571    -37    660  -1856   -873    152  -1578    894  -1573   -678    769  -1273   1284     58    224    447  -1175  -1782  -1125    17
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     5    369   -433   -475    286   -974  -1312    -19   -412    664    398    406   1030  -1394    388   -214   -261     85   -166  -1227   -725    18
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     6  -1291   -884  -3696  -3261  -1137  -3425  -2802   2322  -3066    111     19  -3028  -3275  -2855  -3100  -2670  -1269   2738  -2450  -2062    19
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     7    157   -413   -236    316  -1387  -1231     89   -863   1084   -431   -348    910  -1319    635    297     15    704   -483  -1497   -922    20
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     8    770  -1431    -43    459  -1751   -340     78  -1449    440  -1497   -631    866  -1302    825    -51    953    364  -1076  -1750  -1121    21
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     9    420   -186  -2172  -1577      8  -1818   -694   1477  -1281    760    614  -1299  -1867  -1001  -1262   -189    -12   1401   -722   -364    22
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
    10   -961   -879  -2277  -1821   1366  -2213   -204   -399  -1500   -130    -39  -1427  -2266  -1186  -1511   -159   -913   -367   4721   1177    23
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
    11    -48  -1782    809    844  -2073   1456      8  -1811    315  -1803   -932    180  -1365    921   -218    173   -115  -1399  -2018  -1327    24


  [Part of this file has been deleted for brevity]

     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   128   -415  -1926   1575   1399  -2219  -1163     17  -1983    527  -1929  -1039    341  -1367   1597   -212    257   -222  -1536  -2109  -1387   144
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   129   -529  -1434   -629   -143  -1926   -626   -171  -1460   2679  -1597   -839   -309  -1599    207    317   -530   -510   -130  -1840  -1369   145
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   130    811   -397  -2389  -1807   1883  -2039   -907    594  -1512   1077    687  -1532  -2065  -1201  -1483  -1125   -465   1067   -843   -472   146
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   131   -241   -102  -2327  -1710    724  -1767   -616    650  -1363   1074   1765   -718  -1809  -1026  -1252   -842   -181   1331   -541    695   147
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   132    723     95    385    823  -1820  -1168    167  -1540    875  -1362   -644    320  -1261    810    246    693    -67  -1141  -1753  -1098   148
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   133    551   -430  -1049   -481   -442    469   -241    465   -313    133    947   -411  -1543    197   -587   -146    202    522   -843   -429   149
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   134  -1086   -777  -3351  -2800    816  -2898  -1861   1501  -2515   1149    586  -2483  -2775  -2108  -2400  -2046  -1030   2380  -1511  -1216   150
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   135   1393   1409   -876   -345   -997   -525   -315   -590   -198   -847   -109   -420  -1441    -97    412    766   -130    139  -1306   -858   151
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   136     98  -1299     36    365  -1495  -1211   1241   -404    523   -952   -426   1174  -1303    511    -18    347    882   -853  -1566   -970   152
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   137   1308   -787    564   -132   -966  -1332   -203   -362    -49   -395    -57   -305  -1481     49   -437   -190   -182   1020  -1282   -802   153
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   138  -1746  -1358  -3897  -3341   -216  -3621  -2478   1774  -3040   2442   1158  -3189  -3229  -2422  -2853  -2824  -1659    392  -1720  -1647   154
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   139   1176  -1289   -179    534  -1606   -607     34  -1278    734  -1372   -534     44  -1325    433    -89    521    826   -941  -1666  -1072   155
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   140    602  -1500   -135    850  -1753  -1214   1951  -1452    838  -1484    431    118  -1306    555    347    489   -153  -1085  -1723  -1092   156
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -22  -6602  -7644   -894  -1115   -701  -1378      *      * 
   141    351  -1646   -165    546  -1976   -498     46  -1667   2193  -1662   -798     35  -1405    476    311    -73   -306  -1287  -1859  -1254   157
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6561  -7603   -894  -1115   -701  -1378      *      * 
   142  -1995  -1606  -3095  -2870   1739  -3015    -98  -1012  -2520   -730    655  -1990  -2962  -1884  -2326  -2167  -1915  -1128    548   4089   158
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -25  -6455  -7497   -894  -1115   -701  -1378      *      * 
   143   -253  -1373   -267    301   -911   -565   1956   -450   1188  -1330   -497     33  -1352    502   1358   -205   -184   -941  -1604  -1026   159
     -      *      *      *      *      *      *      *      *      *      *      *      *      *      *      *      *      *      *      *      * 
     -      *      *      *      *      *      *      *      *      0 
//

File: Artemia.fa

>S13421 S13421 GLOBIN - BRINE SHRIMP
DKATIKRTWATVTDLPSFGRNVFLSVFAAK
PEYKNLFVEFRNIPASELASSERLLYHGGR
VLSSIDEAIAGIDTPDRAVKTLLALGERHI
SRGTVRRHFEAFSYAFIDELKQRGVESADL
AAWRRGWDNIVNVLEAGLLRRQIDLEVTGL
SCVDVANIQESWSKVSGDLKTTGSVVFQRM
INGHPEYQQLFRQFRDVDLDKLGESNSFVA
HVFRVVAAFDGIIHELDNNQFIVSTLKKLG
EQHIARGTDISHFQNFRVTLLEYLKENGMN
GAQKASWNKAFDAFEKYISMGLSSLKRVDP
ITGLSGLEKNAILSTWGKVRGNLQEVGKAT
FGKLFTAHPEYQQMFRFSQGMPLASLVESP
KFAAHTQRVVSALDQTLLALNRPSDFVYMI
KELGLDHINRGTDRSHFENYQVVFIEYLKE
TLGDSLDEFTVKSFNHVFEVIISFLNEGLR
QADIVDPVTHLTGRQKEMIKASWSKARTDL
RSLGQELFMRMFKAHPEYQTLFVNKGFADV
PLVSLREDERFISHMANVLGGFDTLLQNLD
ESSYFIYSLRNLGDAHIQRKAGTQHFRSFE
AILIPILQESQGLDAASVEAWKKFFDVSIG
VIAQGLKVATSEEADPVTGLYGKEIVALRQ
AFAAVTPRNVEIGKRVFAKLFAAHPEYKNL
FKKFEQYSVEELPSTDAFHYHISLVMNRFS
SIGKVIDDNVSFVYLLKKLGREHIKRGLSR
KQFDQFVELYIAEISSELSDTGRNGLEKVL
TFATGVIEQGLFQLGQVDSNTLTALEKQSI
QDIWSNLRSTGLQDLAVKIFTRLFSAHPEY
KLLFTGRFGNVDNINENAPFKAHLHRVLSA
FDIVISTLDDSEHLIRQLKDLGLFHTRLGM
TRSHFDNFATAFLSVAQDIAPNQLTVLGRE
SLNKGFKLMHGVIEEGLLQLERINPITGLS
AREVAVVKQTWNLVKPDLMGVGMRIFKSLF
EAFPAYQAVFPKFSDVPLDKLEDTPAVGKH
SISVTTKLDELIQTLDEPANLALLARQLGE
DHIVLRVNKPMFKSFGKVLVRLLENDLGQR
FSSFASRSWHKAYDVIVEYIEEGLQQSYKQ
DPVTGITDAEKALVQESWDLLKPDLLGLGR
KIFTKVFTKHPDYQILFTRTGFGDTPLTKL
DDNPAFGTHIIKVMRAFDHVIQILGKPKTL
MAYLRSVGADHIATNVERRHFQAFSNALIP
VMQHDLKAQLRPDAVAAWRKGLDRIIGIID
QGLIGLKEVNPQNAFSAYDIQAVQRTWALA
KPDLMGKGAMVFKQLFTDHGYQPLFSNLAQ
YEITGLEGSPELNTHARNVMAQLDTLVGSL
QNSIELGQSLAQLGKDHVPRKVNRVHFKDF
AEHFIPLMKADLGDEFTPLAESAWKRAFDV
MIATIEQGQEGSSHALSSFLTNPVA

Output file format

ehmmsearch outputs a graph to the specified graphics device. outputs a report format file. The default format is ...

Output files for usage example

File: globino.ehmmsearch

hmmsearch - search a sequence database with a profile HMM
HMMER 2.3.2 (Oct 2003)
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
Freely distributed under the GNU General Public License (GPL)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                   ../ehmmcalibrate-ex-keep/globino.hmm [globins50]
Sequence database:          ../../data/hmmnew/Artemia.fa
per-sequence score cutoff:  >= -1000000.0
per-domain score cutoff:    >= -1000000.0
per-sequence Eval cutoff:   <= 10        
per-domain Eval cutoff:     <=      1e+06
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query HMM:   globins50
Accession:   [none]
Description: [none]
  [HMM has been calibrated; E-values are empirical estimates]

Scores for complete sequences (score includes all domains):
Sequence Description                                    Score    E-value  N 
-------- -----------                                    -----    ------- ---
S13421   S13421 GLOBIN - BRINE SHRIMP                   474.3   1.7e-143   9

Parsed for domains:
Sequence Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
-------- ------- ----- -----    ----- -----      -----  -------
S13421     7/9     932  1075 ..     1   143 []    76.9  7.3e-24
S13421     2/9     153   293 ..     1   143 []    63.7  6.8e-20
S13421     3/9     307   450 ..     1   143 []    59.8  9.8e-19
S13421     8/9    1089  1234 ..     1   143 []    57.6  4.5e-18
S13421     9/9    1248  1390 ..     1   143 []    52.3  1.8e-16
S13421     1/9       1   143 [.     1   143 []    51.2    4e-16
S13421     4/9     464   607 ..     1   143 []    46.7  8.6e-15
S13421     6/9     775   918 ..     1   143 []    42.2    2e-13
S13421     5/9     623   762 ..     1   143 []    23.9  6.6e-08

Alignments of top-scoring domains:
S13421: domain 7 of 9, from 932 to 1075: score 76.9, E = 7.3e-24
                   *->eekalvksvwgkveknveevGaeaLerllvvyPetkryFpkFkdLss
                      +e a vk+ w+ v+ ++  vG  +++ l++ +P+ +++FpkF d+  
      S13421   932    REVAVVKQTWNLVKPDLMGVGMRIFKSLFEAFPAYQAVFPKFSDVPL 978  

                   adavkgsakvkahgkkVltalgdavkkldd...lkgalakLselHaqklr
                    d++++++ v +h   V t+l++ ++ ld++ +l+   ++L+e H+  lr
      S13421   979 -DKLEDTPAVGKHSISVTTKLDELIQTLDEpanLALLARQLGEDHIV-LR 1026 

                   vdpenfkllsevllvvlaeklgkeftpevqaalekllaavataLaakYk<
                   v+   fk +++vl+  l++ lg+ f+  ++ +++k+++++++ +++  + 
      S13421  1027 VNKPMFKSFGKVLVRLLENDLGQRFSSFASRSWHKAYDVIVEYIEEGLQ  1075 



  [Part of this file has been deleted for brevity]

                   lrvdpenfkllsevllvvlaeklgkeftpevqaalekllaavataLaakY
                   l + + +f  +++++l v ++  ++++t     +l+k ++++  ++++  
      S13421   868 LGMTRSHFDNFATAFLSVAQDIAPNQLTVLGRESLNKGFKLMHGVIEEGL 917  

                   k<-*
                       
      S13421   918 L    918  

S13421: domain 5 of 9, from 623 to 762: score 23.9, E = 6.6e-08
                   *->eekalvksvwgkveknveevGaeaLerllvvyPetkryFpkFkdLss
                      +e  ++++++  v+    e+G+ ++++l+  +Pe k+ F+kF++ s 
      S13421   623    KEIVALRQAFAAVTPRNVEIGKRVFAKLFAAHPEYKNLFKKFEQYSV 669  

                   adavkgsakvkahgkkVltalgdavkkldd...lkgalakLselHaqklr
                    +++  ++  + h + V++ ++ + k +dd+ +    l+kL++ H+++  
      S13421   670 -EELPSTDAFHYHISLVMNRFSSIGKVIDDnvsFVYLLKKLGREHIKRGL 718  

                   vdpenfkllsevllvvlaeklgkeftpevqaalekllaavataLaakYk<
                      ++ ++++  +     ++ ++e++      lek+l     ++++    
      S13421   719 SRKQFDQFVELYI-----AEISSELSDTGRNGLEKVLTFATGVIEQGLF  762  

                   -*
                     
      S13421     -    -    


Histogram of all scores:
score    obs    exp  (one = represents 1 sequences)
-----    ---    ---
  474      1      0|=                                                          


% Statistical details of theoretical EVD fit:
              mu =   -35.9593
          lambda =     0.2675
chi-sq statistic =     0.0000
  P(chi-square)  =          0

Total sequences searched: 1

Whole sequence top hits:
tophits_s report:
     Total hits:           1
     Satisfying E cutoff:  1
     Total memory:         16K

Domain top hits:
tophits_s report:
     Total hits:           9
     Satisfying E cutoff:  9
     Total memory:         21K

Data files

None.

Notes

1. Command-line arguments

The following original HMMER options are not supported:
-h         : Use -help to get help information instead.
-informat  : All common sequence file formats are supported automatically.

The following additional options are provided:

-outfile   : Multiple sequence alignment output file.

2. Installing EMBASSY HMMER

The EMBASSY HMMER package contains "wrapper" applications providing an EMBOSS-style interface to the applications in the original HMMER package version 2.3.2 developed by Sean Eddy. Please read the file INSTALL in the EMBASSY HMMER package distribution for installation instructions.

3. Installing original HMMER

To use EMBASSY HMMER, you will first need to download and install the original HMMER package. Please read the file 00README in the the original HMMER package distribution for installation instructions:
WWW home:       http://hmmer.wustl.edu/
Distribution:   ftp://ftp.genetics.wustl.edu/pub/eddy/hmmer/

4. Setting up HMMER

For the EMBASSY HMMER package to work, the directory containing the original HMMER executables *must* be in your path. For example if you executables were installed to "/usr/local/hmmer/bin", then type:
set path=(/usr/local/hmmer/bin/ $path)
rehash

5. Getting help

Please read the Userguide.pdf distributed with the original HMMER and included in the EMBASSY HMMER distribution under the DOCS directory. The first 3 chapters (Introduction, Installation and Tutorial) are particularly useful.

Please read the 'Notes' section below for a description of the differences between the original and EMBASSY HMMER, particularly which application command line options are supported.

References

None.

Warnings

Types of input data

hmmer v3.2.1 and therefore EMBASSY HMMER is only recommended for use with protein sequences. If you provide a non-protein sequence you will be reprompted for a protein sequence. To accept nucleic acid sequences you must replace instances of < type: "protein" > in the application ACD files with .

Environment variables

The original hmmer uses BLAST environment variables (below), if defined, to locate files. The EMBASSY HMMER does not.
BLASTDB   location of sequence databases to be searched
BLASMAT   location of substitution matrices
HMMERDB   location of HMMs

Sequence input

The user must provide the full filename of a sequence database for the sequence input ("seqfile" ACD option), not an indirect reference, e.g. a USA is NOT acceptable. This is because hmmsearch (which ehmmsearch wraps) does not support USAs, and a full sequence database is too big to write to a temporary file that the original hmmsearch would understand.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

See also

Program name Description
ehmmalign Align sequences to an HMM profile
ehmmbuild Build a profile HMM from an alignment
ehmmcalibrate Calibrate HMM search statistics
ehmmconvert Convert between profile HMM file formats
ehmmemit Generate sequences from a profile HMM
ehmmfetch Retrieve an HMM from an HMM database
ehmmindex Create a binary SSI index for an HMM database
ehmmpfam Search one or more sequences against an HMM database
oalistat Statistics for multiple alignment files
ohmmalign Align sequences with an HMM
ohmmbuild Build HMM
ohmmcalibrate Calibrate a hidden Markov model
ohmmconvert Convert between HMM formats
ohmmemit Extract HMM sequences
ohmmfetch Extract HMM from a database
ohmmindex Index an HMM database
ohmmpfam Align single sequence with an HMM
ohmmsearch Search sequence database with an HMM

Author(s)

This program is an EMBOSS conversion of a program written by Sean Eddy as part of his HMMER package.

Although we take every care to ensure that the results of the EMBOSS version are identical to those from the original package, we recommend that you check your inputs give the same results in both versions before publication.

Please report all bugs in the EMBOSS version to the EMBOSS bug team, not to the original author. Jon Ison (jison © ebi.ac.uk)
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

This program is an EMBASSY wrapper to a program written by Sean Eddy as part of his hmmer package.

Please report any bugs to the EMBOSS bug team in the first instance, not to Sean Eddy.

History

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.