![]() |
ememe |
Please help by correcting and extending the Wiki pages.
Usage:
ememe [options] dataset outfile
The
MEME -- Multiple EM for Motif Elicitation
MEME is a tool for discovering motifs in a group of related DNA or protein sequences.
A motif is a sequence pattern that occurs repeatedly in a group of related protein or DNA sequences. MEME represents motifs as position-dependent letter-probability matrices which describe the probability of each possible letter at each position in the pattern. Individual MEME motifs do not contain gaps. Patterns with variable-length gaps are split by MEME into two or more separate motifs.
MEME takes as input a group of DNA or protein sequences (the training set) and outputs as many motifs as requested. MEME uses statistical modeling techniques to automatically choose the best width, number of occurrences, and description for each motif.
MEME outputs its results as a hypertext (HTML) document.
The sequences in the dataset should be in Pearson/FASTA format. For example:
>ICYA_MANSE INSECTICYANIN A FORM (BLUE BILIPROTEIN) GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAK LPLENENQGKCTIAEYKYDGKKASVYNSFVSNGVKEYMEGDLEIAPDA >LACB_BOVIN BETA-LACTOGLOBULIN PRECURSOR (BETA-LG) MKCLLLALALTCGAQALIVTQTMKGLDI QKVAGTWYSLAMAASDISLLDAQSAPLRVYVEELKPTPEGDLEILLQKWSequences start with a header line followed by sequence lines. A header line has the character ">" in position one, followed by an unique name without any spaces, followed by (optional) descriptive text. After the header line come the actual sequence lines. Spaces and blank lines are ignored. Sequences may be in capital or lowercase or both.
MEME uses the first word in the header line of each sequence, truncated to 24 characters if necessary, as the name of the sequence. This name must be unique. Sequences with duplicate names will be ignored. (The first word in the title line is everything following the ">" up to the first blank.)
Sequence weights may be specified in the dataset file by special header lines where the unique name is "WEIGHTS" (all caps) and the descriptive text is a list of sequence weights. Sequence weights are numbers in the range 0 < w <=1. All weights are assigned in order to the sequences in the file. If there are more sequences than weights, the remainder are given weight one. Weights must be greater than zero and less than or equal to one. Weights may be specified by more than one "WEIGHT" entry which may appear anywhere in the file. When weights are used, sequences will contribute to motifs in proportion to their weights. Here is an example for a file of three sequences where the first two sequences are very similar and it is desired to down-weight them:
>WEIGHTS 0.5 .5 1.0 >seq1 GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAK >seq2 GDMFCPGYCPDVKPVGDFDLSAFAGAWHELAK >seq3 QKVAGTWYSLAMAASDISLLDAQSAPLRVYVEELKPTPEGDLEILLQKW
ALPHABET - control the alphabet for the motifs (patterns) that MEME will search for
DISTRIBUTION - control how MEME assumes the occurrences of the motifs are distributed throughout the training set sequences
SEARCH - control how MEME searches for motifs
SYSTEM - the -p
In what follows, < n > is an integer, < a > is a decimal number, and < string > is a string of characters.
DNA sequences must contain only the letters "ACGT", plus the ambiguous letters "BDHKMNRSUVWY*-".
Protein sequences must contain only the letters "ACDEFGHIKLMNPQRSTVWY", plus the ambiguous letters "BUXZ*-".
MEME converts all ambiguous letters to "X", which is treated as "unknown".
-dna Assume sequences are DNA; default: protein sequences
-protein Assume sequences are protein
-mod < string > The type of distribution to assume.
oops
One Occurrence Per Sequence
MEME assumes that each sequence in the dataset
contains exactly one occurrence of each motif.
This option is the fastest and most sensitive
but the motifs returned by MEME may be
"blurry" if any of the sequences is missing
them.
zoops
Zero or One Occurrence Per Sequence
MEME assumes that each sequence may contain at
most one occurrence of each motif. This option
is useful when you suspect that some motifs
may be missing from some of the sequences. In
that case, the motifs found will be more
accurate than using the first option. This
option takes more computer time than the
first option (about twice as much) and is
slightly less sensitive to weak motifs present
in all of the sequences.
anr
Any Number of Repetitions
MEME assumes each sequence may contain any
number of non-overlapping occurrences of each
motif. This option is useful when you suspect
that motifs repeat multiple times within a
single sequence. In that case, the motifs
found will be much more accurate than using
one of the other options. This option can also
be used to discover repeats within a single
sequence. This option takes the much more
computer time than the first option (about ten
times as much) and is somewhat less sensitive
to weak motifs which do not repeat within a
single sequence than the other two options.
MEME uses an objective function on motifs to select the "best" motif. The objective function is based on the statistical significance of the log likelihood ratio (LLR) of the occurrences of the motif. The E-value of the motif is an estimate of the number of motifs (with the same width and number of occurrences) that would have equal or higher log likelihood ratio if the training set sequences had been generated randomly according to the (0-order portion of the) background model.
MEME searches for the motif with the smallest E-value. It searches over different motif widths, numbers of occurrences, and positions in the training set for the motif occurrences. The user may limit the range of motif widths and number of occurrences that MEME tries using the switches described below. In addition, MEME trims the motif (using a dynamic programming multiple alignment) to eliminate any positions where there is a gap in any of the occurrences.
The log likelihood ratio of a motif is
llr = log (Pr(sites | motif) / Pr(sites | back))and is a measure of how different the sites are from the background model. Pr(sites | motif) is the probability of the occurrences given the a model consisting of the position-specific probability matrix (PSPM) of the motif. (The PSPM is output by MEME).
Pr(sites | back) is the probability of the occurrences given the background model. The background model is an n-order Markov model. By default, it is a 0-order model consisting of the frequencies of the letters in the training set. A different 0-order Markov model or higher order Markov models can be specified to MEME using the -bfile option described below.
The E-value reported by MEME is actually an approximation of the E-value of the log likelihood ratio. (An approximation is used because it is far more efficient to compute.) The approximation is based on the fact that the log likelihood ratio of a motif is the sum of the log likelihood ratios of each column of the motif. Instead of computing the statistical significance of this sum (its p-value), MEME computes the p-value of each column and then computes the significance of their product. Although not identical to the significance of the log likelihood ratio, this easier to compute objective function works very similarly in practice.
The motif significance is reported as the E-value of the motif.
The statistical signficance of a motif is computed based on:
-evt < p > Quit looking for motifs if E-value exceeds < p >. Default: infinite (so by default MEME never quits before -nmotifs < n > have been found.) C) NUMBER OF MOTIF OCCURENCES -nsites < n > -minsites < n > -maxsites < n > The (expected) number of occurrences of each motif. If -nsites is given, only that number of occurrences is tried. Otherwise, numbers of occurrences between -minsites and -maxsites are tried as initial guesses for the number of motif occurrences. These switches are ignored if mod = oops.
Default:
-minsites sqrt(number sequences)
-maxsites Default:
zoops # of sequences
anr MIN(5*#sequences, 50)
-wnsites < n > The weight on the prior on nsites. This controls
how strong the bias towards motifs with exactly
nsites sites (or between minsites and maxsites sites)
is. It is a number in the range [0..1). The
larger it is, the stronger the bias towards
motifs with exactly nsites occurrences is.
Default: 0.8
D) MOTIF WIDTH
-w < n >
-minw < n >
-maxw < n >
The width of the motif(s) to search for.
If -w is given, only that width is tried.
Otherwise, widths between -minw and -maxw are tried.
Default: -minw 8, -maxw 50 (defined in user.h)
Note: If < n > is less than the length of the shortest
sequence in the dataset, < n > is reset by MEME to
that value.
-nomatrim
-wg < a >
-ws < a >
-noendgaps
These switches control trimming (shortening) of
motifs using the multiple alignment method.
Specifying -nomatrim causes MEME to skip this and
causes the other switches to be ignored.
MEME finds the best motif
found and then trims (shortens) it using the multiple
alignment method (described below). The number of
occurrences is then adjusted to maximize the motif
E-value, and then the motif width is further
shortened to optimize the E-value.
The multiple alignment method performs a separate
pairwise alignment of the site with the highest
probability and each other possible site.
(The alignment includes width/2 positions on either
side of the sites.) The pairwise alignment
is controlled by the switches:
-wg < a > (gap cost; default: 11),
-ws < a > (space cost; default 1), and,
-noendgaps (do not penalize endgaps; default:
penalize endgaps).
The pairwise alignments are then combined and the
method determines the widest section of the motif with
no insertions or deletions. If this alignment
is shorter than < minw >, it tries to find an alignment
allowing up to one insertion/deletion per motif
column. This continues (allowing up to 2, 3 ...
insertions/deletions per motif column) until an
alignment of width at least < minw > is found.
E) BACKGROUND MODEL
-bfile < bfile >
The name of the file containing the background model
for sequences. The background model is the model
of random sequences used by MEME. The background
model is used by MEME
Markov models of any order can be specified in < bfile > by listing frequencies of all possible tuples of length up to order+1.
Note that MEME uses only the 0-order portion (single letter frequencies) of the background model for purposes 3) and 4), but uses the full-order model for purposes 1) and 2), above.
Example: To specify a 1-order Markov background model for DNA, < bfile > might contain the following lines. Note that optional comment lines are by "#" and are ignored by MEME.
# tuple frequency_non_coding a 0.324 c 0.176 g 0.176 t 0.324 # tuple frequency_non_coding aa 0.119 ac 0.052 ag 0.056 at 0.097 ca 0.058 cc 0.033 cg 0.028 ct 0.056 ga 0.056 gc 0.035 gg 0.033 gt 0.052 ta 0.091 tc 0.056 tg 0.058 tt 0.119Sample -bfile files are given in directory tests:
-pal
Choosing -pal causes MEME to look for palindromes in
DNA datasets.
MEME averages the letter frequencies in corresponding columns of the motif (PSPM) together. For instance, if the width of the motif is 10, columns 1 and 10, 2 and 9, 3 and 8, etc., are averaged together. The averaging combines the frequency of A in one column with T in the other, and the frequency of C in one column with G in the other. If neither option is not chosen, MEME does not search for DNA palindromes.
G) EM ALGORITHM
-maxiter < n >
The number of iterations of EM to run from
any starting point.
EM is run for < n > iterations or until convergence
(see -distance, below) from each starting point.
Default: 50
-distance < a >
The convergence criterion. MEME stops
iterating EM when the change in the
motif frequency matrix is less than < a >.
(Change is the euclidean distance between
two successive frequency matrices.)
Default: 0.001
-prior < string >
-b < a >
-plib < string >
The name of the file containing the Dirichlet prior
in the format of file prior30.plib.
H) SELECTING STARTS FOR EM
The default is for MEME to search the dataset for good starts for EM. How
the starting points are derived from the dataset is specified by the
following switches.
The default type of mapping MEME uses is:
-spmap uni for -dna and -alph < string >
-spmap pam for -protein
-spfuzz < a > The fuzziness of the mapping.
Possible values are greater than 0. Meaning
depends on -spmap, see below.
-spmap < string > The type of mapping function to use.
uni Use add-< a > prior when converting a substring
to an estimate of theta.
Default -spfuzz < a >: 0.5
pam Use columns of PAM < a > matrix when converting
a substring to an estimate of theta.
Default -spfuzz < a >: 120 (PAM 120)
Other types of starting points
can be specified using the following switches.
-cons < string > Override the sampling of starting points
and just use a starting point derived from
< string >.
This is useful when an actual occurrence of
a motif is known and can be used as the
starting point for finding the motif.
% ememe crp0.s -mod oops Multiple EM for motif elicitation MEME program output file output directory [.]: |
Go to the input files for this example
Go to the output files for this example
Please note the examples below are unedited excerpts of the original MEME documentation. Bear in mind the EMBASSY and original MEME options may differ in practice (see "1. Command-line arguments").
The following examples use data files provided in this release of MEME. MEME writes its output to standard output, so you will want to redirect it to a file in order for use with MAST.
1) A simple DNA example:
meme crp0.s -dna -mod oops -pal > ex1.html
MEME looks for a single motif in the file crp0.s which contains DNA sequences in FASTA format. The OOPS model is used so MEME assumes that every sequence contains exactly one occurrence of the motif. The palindrome switch is given so the motif model (PSPM) is converted into a palindrome by combining corresponding frequency columns. MEME automatically chooses the best width for the motif in this example since no width was specified.
2) Searching for motifs on both DNA strands:
meme crp0.s -dna -mod oops -revcomp > ex2.html
This is like the previous example except that the -revcomp switch tells MEME to consider both DNA strands, and the -pal switch is absent so the palindrome conversion is omitted. When DNA uses both DNA strands, motif occurrences on the two strands may not overlap. That is, any position in the sequence given in the training set may be contained in an occurrence of a motif on the positive strand or the negative strand, but not both.
3) A fast DNA example:
meme crp0.s -dna -mod oops -revcomp -w 20 > ex3.html
This example differs from example 1) in that MEME is told to only consider motifs of width 20. This causes MEME to execute about 10 times faster. The -w switch can also be used with protein datasets if the width of the motifs are known in advance.
4) Using a higher-order background model:
meme INO_up800.s -dna -mod anr -revcomp -bfile yeast.nc.6.freq > ex4.html
In this example we use -mod anr and -bfile yeast.nc.6.freq. This specifies
that
a) the motif may have any number of occurrences in each sequence, and,
b) the Markov model specified in yeast.nc.6.freq is used as the
background model. This file contains a fifth-order Markov model
for the non-coding regions in the yeast genome.
Using a higher order background model can often result in more sensitive
detection of motifs. This is because the background model more accurately
models non-motif sequence, allowing MEME to discriminate against it and find
the true motifs.
5) A simple protein example:
meme lipocalin.s -mod oops -maxw 20 -nmotifs 2 > ex5.html
The -dna switch is absent, so MEME assumes the file lipocalin.s contains protein sequences. MEME searches for two motifs each of width less than or equal to 20. (Specifying -maxw 20 makes MEME run faster since it does not have to consider motifs longer than 20.) Each motif is assumed to occur in each of the sequences because the OOPS model is specified.
6) Another simple protein example:
meme farntrans5.s -mod anr -maxw 40 -maxsites 50 > ex6.html
MEME searches for a motif of width up to 40 with up to 50 occurrences in the entire training set. The ANR sequence model is specified, which allows each motif to have any number of occurrences in each sequence. This dataset contains motifs with multiple repeats of motifs in each sequence. This example is fairly time consuming due to the fact that the time required to initiale the motif probability tables is proportional to < maxw > times < maxsites >. By default, MEME only looks for motifs up to 29 letters wide with a maximum total of number of occurrences equal to twice the number of sequences or 30, whichever is less.
7) A much faster protein example:
meme farntrans5.s -mod anr -w 10 -maxsites 30 -nmotifs 3 > ex7.html
This time MEME is constrained to search for three motifs of width exactly ten. The effect is to break up the long motif found in the previous example. The -w switch forces motifs to be *exactly* ten letters wide. This example is much faster because, since only one width is considered, the time to build the motif probability tables is only proportional to < maxsites >.
8) Splitting the sites into three:
meme farntrans5.s -mod anr -maxw 12 -nsites 24 -nmotifs 3 > ex8.html
This forces each motif to have 24 occurrences, exactly, and be up to 12 letters wide.
9) A larger protein example with E-value cutoff:
meme adh.s -mod zoops -nmotifs 20 -evt 0.01 > ex9.html
In this example, MEME looks for up to 20 motifs, but stops when a motif is found with E-value greater than 0.01. Motifs with large E-values are likely to be statistical artifacts rather than biologically significant.
Most of the options in the original meme are given in ACD as "advanced" or "additional" options. -options must be specified on the command-line in order to be prompted for a value for "additional" options but "advanced" options will never be prompted for.
Multiple EM for motif elicitation Version: EMBOSS:6.6.0.0 Standard (Mandatory) qualifiers: [-dataset] seqset User must provide the full filename of a set of sequences, not an indirect reference, e.g. a USA is NOT acceptable. [-outdir] outdir [.] MEME program output file output directory Additional (Optional) qualifiers: -bfile infile The name of the file containing the background model for sequences. The background model is the model of random sequences used by MEME. The background model is used by MEME 1) during EM as the 'null model', 2) for calculating the log likelihood ratio of a motif, 3) for calculating the significance (E-value) of a motif, and, 4) for creating the position-specific scoring matrix (log-odds matrix). See application documentation for more information. -plibfile infile The name of the file containing the Dirichlet prior in the format of file prior30.plib -mod selection [zoops] If you know how occurrences of motifs are distributed in the training set sequences, you can specify it with these options. The default distribution of motif occurrences is assumed to be zero or one occurrence per sequence. oops : One Occurrence Per Sequence. MEME assumes that each sequence in the dataset contains exactly one occurrence of each motif. This option is the fastest and most sensitive but the motifs returned by MEME may be 'blurry' if any of the sequences is missing them. zoops : Zero or One Occurrence Per Sequence. MEME assumes that each sequence may contain at most one occurrence of each motif. This option is useful when you suspect that some motifs may be missing from some of the sequences. In that case, the motifs found will be more accurate than using the first option. This option takes more computer time than the first option (about twice as much) and is slightly less sensitive to weak motifs present in all of the sequences. anr : Any Number of Repetitions. MEME assumes each sequence may contain any number of non-overlapping occurrences of each motif. This option is useful when you suspect that motifs repeat multiple times within a single sequence. In that case, the motifs found will be much more accurate than using one of the other options. This option can also be used to discover repeats within a single sequence. This option takes the much more computer time than the first option (about ten times as much) and is somewhat less sensitive to weak motifs which do not repeat within a single sequence than the other two options. -nmotifs integer [1] The number of *different* motifs to search for. MEME will search for and output |
Qualifier | Type | Description | Allowed values | Default |
---|---|---|---|---|
Standard (Mandatory) qualifiers | ||||
[-dataset] (Parameter 1) |
seqset | User must provide the full filename of a set of sequences, not an indirect reference, e.g. a USA is NOT acceptable. | Readable set of sequences | Required |
[-outdir] (Parameter 2) |
outdir | MEME program output file output directory | Output directory | . |
Additional (Optional) qualifiers | ||||
-bfile | infile | The name of the file containing the background model for sequences. The background model is the model of random sequences used by MEME. The background model is used by MEME 1) during EM as the 'null model', 2) for calculating the log likelihood ratio of a motif, 3) for calculating the significance (E-value) of a motif, and, 4) for creating the position-specific scoring matrix (log-odds matrix). See application documentation for more information. | Input file | Required |
-plibfile | infile | The name of the file containing the Dirichlet prior in the format of file prior30.plib | Input file | Required |
-mod | selection | If you know how occurrences of motifs are distributed in the training set sequences, you can specify it with these options. The default distribution of motif occurrences is assumed to be zero or one occurrence per sequence. oops : One Occurrence Per Sequence. MEME assumes that each sequence in the dataset contains exactly one occurrence of each motif. This option is the fastest and most sensitive but the motifs returned by MEME may be 'blurry' if any of the sequences is missing them. zoops : Zero or One Occurrence Per Sequence. MEME assumes that each sequence may contain at most one occurrence of each motif. This option is useful when you suspect that some motifs may be missing from some of the sequences. In that case, the motifs found will be more accurate than using the first option. This option takes more computer time than the first option (about twice as much) and is slightly less sensitive to weak motifs present in all of the sequences. anr : Any Number of Repetitions. MEME assumes each sequence may contain any number of non-overlapping occurrences of each motif. This option is useful when you suspect that motifs repeat multiple times within a single sequence. In that case, the motifs found will be much more accurate than using one of the other options. This option can also be used to discover repeats within a single sequence. This option takes the much more computer time than the first option (about ten times as much) and is somewhat less sensitive to weak motifs which do not repeat within a single sequence than the other two options. | Choose from selection list of values | zoops |
-nmotifs | integer | The number of *different* motifs to search for. MEME will search for and output <n> motifs. | Any integer value | 1 |
-text | boolean | Default output is in HTML | Boolean value Yes/No | No |
-prior | selection | The prior distribution on the model parameters. dirichlet: Simple Dirichlet prior. This is the default for -dna and -alph. It is based on the non-redundant database letter frequencies. dmix: Mixture of Dirichlets prior. This is the default for -protein. mega: Extremely low variance dmix; variance is scaled inversely with the size of the dataset. megap: Mega for all but last iteration of EM; dmix on last iteration. addone: Add +1 to each observed count. | Choose from selection list of values | dirichlet |
-evt | float | Quit looking for motifs if E-value exceeds this value. Has an extremely high default so by default MEME never quits before -nmotifs <n> have been found. A value of -1 here is a shorthand for infinity. | Any numeric value | -1 |
-nsites | integer | These switches are ignored if mod = oops. The (expected) number of occurrences of each motif. If a value for -nsites is specified, only that number of occurrences is tried. Otherwise, numbers of occurrences between -minsites and -maxsites are tried as initial guesses for the number of motif occurrences. If a value is not specified for -minsites and maxsites then the default hardcoded into MEME, as opposed to the default value given in the ACD file, is used. The hardcoded default value of -minsites is equal to sqrt(number sequences). The hardcoded default value of -maxsites is equal to the number of sequences (zoops) or MIN(5* num.sequences, 50) (anr). A value of -1 here represents nsites being unspecified. | Any integer value | -1 |
-minsites | integer | These switches are ignored if mod = oops. The (expected) number of occurrences of each motif. If a value for -nsites is specified, only that number of occurrences is tried. Otherwise, numbers of occurrences between -minsites and -maxsites are tried as initial guesses for the number of motif occurrences. If a value is not specified for -minsites and maxsites then the default hardcoded into MEME, as opposed to the default value given in the ACD file, is used. The hardcoded default value of -minsites is equal to sqrt(number sequences). The hardcoded default value of -maxsites is equal to the number of sequences (zoops) or MIN(5 * num.sequences, 50) (anr). A value of -1 here represents minsites being unspecified. | Any integer value | -1 |
-maxsites | integer | These switches are ignored if mod = oops. The (expected) number of occurrences of each motif. If a value for -nsites is specified, only that number of occurrences is tried. Otherwise, numbers of occurrences between -minsites and -maxsites are tried as initial guesses for the number of motif occurrences. If a value is not specified for -minsites and maxsites then the default hardcoded into MEME, as opposed to the default value given in the ACD file, is used. The hardcoded default value of -minsites is equal to sqrt(number sequences). The hardcoded default value of -maxsites is equal to the number of sequences (zoops) or MIN(5 * num.sequences, 50) (anr). A value of -1 here represents maxsites being unspecified. | Any integer value | -1 |
-wnsites | float | The weight of the prior on nsites. This controls how strong the bias towards motifs with exactly nsites sites (or between minsites and maxsites sites) is. It is a number in the range [0..1). The larger it is, the stronger the bias towards motifs with exactly nsites occurrences is. | Any numeric value | 0.8 |
-w | integer | The width of the motif(s) to search for. If -w is given, only that width is tried. Otherwise, widths between -minw and -maxw are tried. Note: if width is less than the length of the shortest sequence in the dataset, width is reset by MEME to that value. A value of -1 here represents -w being unspecified. | Any integer value | -1 |
-minw | integer | The width of the motif(s) to search for. If -w is given, only that width is tried. Otherwise, widths between -minw and -maxw are tried. Note: if width is less than the length of the shortest sequence in the dataset, width is reset by MEME to that value. | Any integer value | 8 |
-maxw | integer | The width of the motif(s) to search for. If -w is given, only that width is tried. Otherwise, widths between -minw and -maxw are tried. Note: if width is less than the length of the shortest sequence in the dataset, width is reset by MEME to that value. | Any integer value | 50 |
-nomatrim | boolean | The -nomatrim, -wg, -ws and -noendgaps switches control trimming (shortening) of motifs using the multiple alignment method. Specifying -nomatrim causes MEME to skip this and causes the other switches to be ignored. The pairwise alignment is controlled by the switches -wg (gap cost), -ws (space cost) and -noendgaps (do not penalize endgaps). See application documentation for further information. | Boolean value Yes/No | No |
-wg | integer | The -nomatrim, -wg, -ws and -noendgaps switches control trimming (shortening) of motifs using the multiple alignment method. Specifying -nomatrim causes MEME to skip this and causes the other switches to be ignored. The pairwise alignment is controlled by the switches -wg (gap cost), -ws (space cost) and -noendgaps (do not penalize endgaps). See application documentation for further information. | Any integer value | 11 |
-ws | integer | The -nomatrim, -wg, -ws and -noendgaps switches control trimming (shortening) of motifs using the multiple alignment method. Specifying -nomatrim causes MEME to skip this and causes the other switches to be ignored. The pairwise alignment is controlled by the switches -wg (gap cost), -ws (space cost) and -noendgaps (do not penalize endgaps). See application documentation for further information. | Any integer value | 1 |
-noendgaps | boolean | The -nomatrim, -wg, -ws and -noendgaps switches control trimming (shortening) of motifs using the multiple alignment method. Specifying -nomatrim causes MEME to skip this and causes the other switches to be ignored. The pairwise alignment is controlled by the switches -wg (gap cost), -ws (space cost) and -noendgaps (do not penalise endgaps). See application documentation for further information. | Boolean value Yes/No | No |
-revcomp | boolean | Motif occurrences may be on the given DNA strand or on its reverse complement. The default is to look for DNA motifs only on the strand given in the training set. | Boolean value Yes/No | No |
-pal | boolean | Choosing -pal causes MEME to look for palindromes in DNA datasets. MEME averages the letter frequencies in corresponding columns of the motif (PSPM) together. For instance, if the width of the motif is 10, columns 1 and 10, 2 and 9, 3 and 8, etc., are averaged together. The averaging combines the frequency of A in one column with T in the other, and the frequency of C in one column with G in the other. | Boolean value Yes/No | No |
-[no]nostatus | boolean | Set this option to prevent progress reports to the terminal. | Boolean value Yes/No | Yes |
Advanced (Unprompted) qualifiers | ||||
-maxiter | integer | The number of iterations of EM to run from any starting point. EM is run for <n> iterations or until convergence (see -distance, below) from each starting point. | Any integer value | 50 |
-distance | float | The convergence criterion. MEME stops iterating EM when the change in the motif frequency matrix is less than <a>. (Change is the euclidean distance between two successive frequency matrices.) | Any numeric value | 0.001 |
-b | float | The strength of the prior on model parameters. A value of 0 means use intrinsic strength of prior if prior = dmix. The default values are 0.01 if prior = dirichlet or 0 if prior = dmix. These defaults are hardcoded into MEME (the value of the default in the ACD file is not used). A value of -1 here represents -b being unspecified. | Any numeric value | -1.0 |
-spfuzz | float | The fuzziness of the mapping. Possible values are greater than 0. Meaning depends on -spmap, see below. See the application documentation for more information. A value of -1.0 here represents -spfuzz being unspecified. | Any numeric value | -1.0 |
-spmap | selection | The type of mapping function to use. uni: Use prior when converting a substring to an estimate of theta. Default -spfuzz <a>: 0.5. pam: Use columns of PAM <a> matrix when converting a substring to an estimate of theta. Default -spfuzz <a>: 120 (PAM 120). See the application documentation for more information. | Choose from selection list of values | default |
-cons | string | Override the sampling of starting points and just use a starting point derived from <string>. This is useful when an actual occurrence of a motif is known and can be used as the starting point for finding the motif. See the application documentation for more information. | Any string | |
-maxsize | integer | Maximum dataset size in characters (-1 = use meme default). | Any integer value | -1 |
-p | integer | Only values of >0 will be applied. The -p <np> argument causes a version of MEME compiled for a parallel CPU architecture to be run. (By placing <np> in quotes you may pass installation specific switches to the 'mpirun' command. The number of processors to run on must be the first argument following -p). | Any integer value | 0 |
-time | integer | Only values of more than 0 will be applied. | Any integer value | 0 |
-sf | string | Print <sf> as name of sequence file | Any string | |
-heapsize | integer | The search for good EM starting points can be improved by using a branching search. A branching search begins with a fixed-size heap of best EM starts identified during the search of subsequences from the dataset. These starts are also called seeds. The fixed-size heap of seeds is used as the branch-heap during the first iteration of branching search. See the application documentation for more information. | Any integer value | 64 |
-xbranch | boolean | The search for good EM starting points can be improved by using a branching search. A branching search begins with a fixed-size heap of best EM starts identified during the search of subsequences from the dataset. These starts are also called seeds. The fixed-size heap of seeds is used as the branch-heap during the first iteration of branching search. See the application documentation for more information. | Boolean value Yes/No | No |
-wbranch | boolean | The search for good EM starting points can be improved by using a branching search. A branching search begins with a fixed-size heap of best EM starts identified during the search of subsequences from the dataset. These starts are also called seeds. The fixed-size heap of seeds is used as the branch-heap during the first iteration of branching search. See the application documentation for more information. | Boolean value Yes/No | No |
-bfactor | integer | The search for good EM starting points can be improved by using a branching search. A branching search begins with a fixed-size heap of best EM starts identified during the search of subsequences from the dataset. These starts are also called seeds. The fixed-size heap of seeds is used as the branch-heap during the first iteration of branching search. See the application documentation for more information. | Any integer value | 3 |
Associated qualifiers | ||||
"-dataset" associated seqset qualifiers | ||||
-sbegin1 -sbegin_dataset |
integer | Start of each sequence to be used | Any integer value | 0 |
-send1 -send_dataset |
integer | End of each sequence to be used | Any integer value | 0 |
-sreverse1 -sreverse_dataset |
boolean | Reverse (if DNA) | Boolean value Yes/No | N |
-sask1 -sask_dataset |
boolean | Ask for begin/end/reverse | Boolean value Yes/No | N |
-snucleotide1 -snucleotide_dataset |
boolean | Sequence is nucleotide | Boolean value Yes/No | N |
-sprotein1 -sprotein_dataset |
boolean | Sequence is protein | Boolean value Yes/No | N |
-slower1 -slower_dataset |
boolean | Make lower case | Boolean value Yes/No | N |
-supper1 -supper_dataset |
boolean | Make upper case | Boolean value Yes/No | N |
-scircular1 -scircular_dataset |
boolean | Sequence is circular | Boolean value Yes/No | N |
-squick1 -squick_dataset |
boolean | Read id and sequence only | Boolean value Yes/No | N |
-sformat1 -sformat_dataset |
string | Input sequence format | Any string | |
-iquery1 -iquery_dataset |
string | Input query fields or ID list | Any string | |
-ioffset1 -ioffset_dataset |
integer | Input start position offset | Any integer value | 0 |
-sdbname1 -sdbname_dataset |
string | Database name | Any string | |
-sid1 -sid_dataset |
string | Entryname | Any string | |
-ufo1 -ufo_dataset |
string | UFO features | Any string | |
-fformat1 -fformat_dataset |
string | Features format | Any string | |
-fopenfile1 -fopenfile_dataset |
string | Features file name | Any string | |
"-outdir" associated outdir qualifiers | ||||
-extension2 -extension_outdir |
string | Default file extension | Any string | |
General qualifiers | ||||
-auto | boolean | Turn off prompts | Boolean value Yes/No | N |
-stdout | boolean | Write first file to standard output | Boolean value Yes/No | N |
-filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N |
-options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N |
-debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N |
-verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y |
-help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N |
-warning | boolean | Report warnings | Boolean value Yes/No | Y |
-error | boolean | Report errors | Boolean value Yes/No | Y |
-fatal | boolean | Report fatal errors | Boolean value Yes/No | Y |
-die | boolean | Report dying program messages | Boolean value Yes/No | Y |
-version | boolean | Report version number and exit | Boolean value Yes/No | N |
>ce1cg TAATGTTTGTGCTGGTTTTTGTGGCATCGGGCGAGAATAGCGCGTGGTGTGAAAGACTGTTTTTTTGATCGTTTTCACAA AAATGGAAGTCCACAGTCTTGACAG >ara GACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATTTGCACGGCGTCACACTTTGCT ATGCCATAGCATTTTTATCCATAAG >bglr1 ACAAATCCCAATAACTTAATTATTGGGATTTGTTATATATAACTTTATAAATTCCTAAAATTACACAAAGTTAATAACTG TGAGCATGGTCATATTTTTATCAAT >crp CACAAAGCGAAAGCTATGCTAAAACAGTCAGGATGCTACAGTAATACATTGATGTACTGCATGTATGCAAAGGACGTCAC ATTACCGTGCAGTACAGTTGATAGC >cya ACGGTGCTACACTTGTATGTAGCGCATCTTTCTTTACGGTCAATCAGCAAGGTGTTAAATTGATCACGTTTTAGACCATT TTTTCGTCGTGAAACTAAAAAAACC >deop2 AGTGAATTATTTGAACCAGATCGCATTACAGTGATGCAAACTTGTAAGTAGATTTCCTTAATTGTGATGTGTATCGAAGT GTGTTGCGGAGTAGATGTTAGAATA >gale GCGCATAAAAAACGGCTAAATTCTTGTGTAAACGATTCCACTAATTTATTCCATGTCACACTTTTCGCATCTTTGTTATG CTATGGTTATTTCATACCATAAGCC >ilv GCTCCGGCGGGGTTTTTTGTTATCTGCAATTCAGTACAAAACGTGATCAACCCCTCAATTTTCCCTTTGCTGAAAAATTT TCCATTGTCTCCCCTGTAAAGCTGT >lac AACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGG AATTGTGAGCGGATAACAATTTCAC >male ACATTACCGCCAATTCTGTAACAGAGATCACACAAAGCGACGGTGGGGCGTAGGGGCAAGGAGGATGGAAAGAGGTTGCC GTATAAAGAAACTAGAGTCCGTTTA >malk GGAGGAGGCGGGAGGATGAGAACACGGCTTCTGTGAACTAAACCGAGGTCATGTAAGGAATTTCGTGATGTTGCTTGCAA AAATCGTGGCGATTTTATGTGCGCA >malt GATCAGCGTCGTTTTAGGTGAGTTGTTAATAAAGATTTGGAATTGTGACACAGTGCAAATTCAGACACATAAAAAAACGT CATCGCTTGCATTAGAAAGGTTTCT >ompa GCTGACAAAAAAGATTAAACATACCTTATACAAGACTTTTTTTTCATATGCCTGACGGAGTTCACACTTGTAAGTTTTCA ACTACGTTGTAGACTTTACATCGCC >tnaa TTTTTTAAACATTAAAATTCTTACGTAATTTATAATCTTTAAAAAAAGCATTTAATATTGCTCCCCGAACGATTGTGATT CGATTCACATTTAAACAATTTCAGA >uxu1 CCCATGAGAGTGAAATTGTTGTGATGTGGTTAACCCAATTAGAATTCGGGATTGACATGTCTTACCAAAAGGTAGAACTT ATACGCCATCTCATCCGATGCAAGC >pbr322 CTGGCTTAACTATGCGGCATCAGAGCAGATTGTACTGAGAGTGCACCATATGCGGTGTGAAATACCGCACAGATGCGTAA GGAGAAAATACCGCATCAGGCGCTC >trn9cat CTGTGACGGAAGATCACTTCGCAGAATAAATAAATCCTGGTGTCCCTGTTGATACCGGGAAGCCCTGGGCCAACTTTTGG CGAAAATGAGACGTTGATCGGCACG >tdc GATTTTTATACTTTAACTTGTTGATATTTAAAGGTATTTAATTGTAATAACGATACTCTGGAAAGTATTGAAAGTTAATT TGTGAGTGGTCGCACATATCCTGTT |
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>MEME</title> <style type="text/css"> /* START INCLUDED FILE "meme.css" */ /* The following is the content of meme.css */ body { background-color:white; font-size: 12px; font-family: Verdana, Arial, Helvetica, sans-serif;} div.help { display: inline-block; margin: 0px; padding: 0px; width: 12px; height: 13px; cursor: pointer; background-image: url("help.gif"); background-image: url(""); } div.help2 { color: #999; display: inline-block; width: 12px; height: 12px; border: 1px solid #999; font-size: 13px; line-height:12px; font-family: Helvetica, sans-serif; font-weight: bold; font-style: normal; cursor: pointer; } div.help2:hover { color: #000; border-color: #000; } p.spaced { line-height: 1.8em;} span.citation { font-family: "Book Antiqua", "Palatino Linotype", serif; color: #004a4d;} p.pad { padding-left: 30px; padding-top: 5px; padding-bottom: 10px;} td.jump { font-size: 13px; color: #ffffff; background-color: #00666a; font-family: Georgia, "Times New Roman", Times, serif;} a.jump { margin: 15px 0 0; font-style: normal; font-variant: small-caps; [Part of this file has been deleted for brevity] For use with <a href="http://blocks.fhcrc.org/blocks">BLOCKS tools</a>. </dd> <dt> <a name="format_FASTA_doc"></a>FASTA Format</dt> <dd> The FASTA format as described <a href="http://meme.nbcr.net/meme/doc/fasta-format.html">here</a>. </dd> <dt> <a name="format_raw_doc"></a>Raw Format</dt> <dd> Just the sites of the sequences that contributed to the motif. One site per line. </dd> </dl> </div> <a name="sites_doc"></a><h5 class="doc">Sites</h5> <div class="doc"><p> MEME displays the occurrences (sites) of the motif in the training set. The sites are shown aligned with each other, and the ten sequence positions preceding and following each site are also shown. Each site is identified by the name of the sequence where it occurs, the strand (if both strands of DNA sequences are being used), and the position in the sequence where the site begins. When the DNA strand is specified, '+' means the sequence in the training set, and '-' means the reverse complement of the training set sequence. (For '-' strands, the 'start' position is actually the position on the <b>positive</b> strand where the site ends.) The sites are <b>listed in order of increasing statistical significance</b> (<i>p</i>-value). The <i>p</i>-value of a site is computed from the the match score of the site with the <a href="#format_PSSM_doc">position specific scoring matrix</a> for the motif. The <i>p</i>-value gives the probability of a random string (generated from the background letter frequencies) having the same match score or higher. (This is referred to as the <b>position <i>p</i>-value</b> by the MAST algorithm.) </p></div> <a name="diagrams_doc"></a><h5 class="doc">Block Diagrams</h5> <div class="doc"><p> The occurrences of the motif in the training set sequences are shown as coloured blocks on a line. One diagram is printed for each sequence showing all the sites contributating to that motif in that sequence. The sequences are <b>listed in the same order as in the input</b> to make it easier to compare multiple block diagrams. Additionally the best <i>p</i>-value for the sequence/motif combination is listed though this may not be in ascending order as with the sites. The <i>p</i>-value of an occurrence is the probability of a single random subsequence the length of the motif, generated according to the 0-order background model, having a score at least as high as the score of the occurrence. When the DNA strand is specified '+', it means the motif appears from left to right on the sequence, and '-' means the motif appears from right to left on the complementary strand. A sequence position scale is shown at the end of each table of block diagrams. </p></div> <a name="combined_doc"></a><h5>Combined Block Diagrams</h5> <div class="doc"> <p> The motif occurrences shown in the motif summary <b>may not be exactly the same as those reported in each motif section</b> because only motifs with a position <em>p</em>-value of 0.0001 that don't overlap other, more significant motif occurrences are shown. </p> <p> See the documentation for <a href="http://meme.nbcr.net/meme/mast-output.html">MAST output</a> for the definition of position and combined <em>p</em>-values. </p> </div> </div></span><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br> </form></body> </html> |
>ce1cg TAATGTTTGTGCTGGTTTTTGTGGCATCGGGCGAGAATAGCGCGTGGTGTGAAAGACTGT TTTTTTGATCGTTTTCACAAAAATGGAAGTCCACAGTCTTGACAG >ara GACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATTT GCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAG >bglr1 ACAAATCCCAATAACTTAATTATTGGGATTTGTTATATATAACTTTATAAATTCCTAAAA TTACACAAAGTTAATAACTGTGAGCATGGTCATATTTTTATCAAT >crp CACAAAGCGAAAGCTATGCTAAAACAGTCAGGATGCTACAGTAATACATTGATGTACTGC ATGTATGCAAAGGACGTCACATTACCGTGCAGTACAGTTGATAGC >cya ACGGTGCTACACTTGTATGTAGCGCATCTTTCTTTACGGTCAATCAGCAAGGTGTTAAAT TGATCACGTTTTAGACCATTTTTTCGTCGTGAAACTAAAAAAACC >deop2 AGTGAATTATTTGAACCAGATCGCATTACAGTGATGCAAACTTGTAAGTAGATTTCCTTA ATTGTGATGTGTATCGAAGTGTGTTGCGGAGTAGATGTTAGAATA >gale GCGCATAAAAAACGGCTAAATTCTTGTGTAAACGATTCCACTAATTTATTCCATGTCACA CTTTTCGCATCTTTGTTATGCTATGGTTATTTCATACCATAAGCC >ilv GCTCCGGCGGGGTTTTTTGTTATCTGCAATTCAGTACAAAACGTGATCAACCCCTCAATT TTCCCTTTGCTGAAAAATTTTCCATTGTCTCCCCTGTAAAGCTGT >lac AACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGCTT CCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCAC >male ACATTACCGCCAATTCTGTAACAGAGATCACACAAAGCGACGGTGGGGCGTAGGGGCAAG GAGGATGGAAAGAGGTTGCCGTATAAAGAAACTAGAGTCCGTTTA >malk GGAGGAGGCGGGAGGATGAGAACACGGCTTCTGTGAACTAAACCGAGGTCATGTAAGGAA TTTCGTGATGTTGCTTGCAAAAATCGTGGCGATTTTATGTGCGCA >malt GATCAGCGTCGTTTTAGGTGAGTTGTTAATAAAGATTTGGAATTGTGACACAGTGCAAAT TCAGACACATAAAAAAACGTCATCGCTTGCATTAGAAAGGTTTCT >ompa GCTGACAAAAAAGATTAAACATACCTTATACAAGACTTTTTTTTCATATGCCTGACGGAG TTCACACTTGTAAGTTTTCAACTACGTTGTAGACTTTACATCGCC >tnaa TTTTTTAAACATTAAAATTCTTACGTAATTTATAATCTTTAAAAAAAGCATTTAATATTG CTCCCCGAACGATTGTGATTCGATTCACATTTAAACAATTTCAGA >uxu1 CCCATGAGAGTGAAATTGTTGTGATGTGGTTAACCCAATTAGAATTCGGGATTGACATGT CTTACCAAAAGGTAGAACTTATACGCCATCTCATCCGATGCAAGC >pbr322 CTGGCTTAACTATGCGGCATCAGAGCAGATTGTACTGAGAGTGCACCATATGCGGTGTGA AATACCGCACAGATGCGTAAGGAGAAAATACCGCATCAGGCGCTC >trn9cat CTGTGACGGAAGATCACTTCGCAGAATAAATAAATCCTGGTGTCCCTGTTGATACCGGGA AGCCCTGGGCCAACTTTTGGCGAAAATGAGACGTTGATCGGCACG >tdc GATTTTTATACTTTAACTTGTTGATATTTAAAGGTATTTAATTGTAATAACGATACTCTG GAAAGTATTGAAAGTTAATTTGTGAGTGGTCGCACATATCCTGTT |
******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 4.7.0 (Release date: Wed Sep 28 17:30:10 EST 2011) For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.nbcr.net. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.nbcr.net. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= ./meme.fasta ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ ce1cg 1.0000 105 ara 1.0000 105 bglr1 1.0000 105 crp 1.0000 105 cya 1.0000 105 deop2 1.0000 105 gale 1.0000 105 ilv 1.0000 105 lac 1.0000 105 male 1.0000 105 malk 1.0000 105 malt 1.0000 105 ompa 1.0000 105 tnaa 1.0000 105 uxu1 1.0000 105 pbr322 1.0000 105 trn9cat 1.0000 105 tdc 1.0000 105 ******************************************************************************** ******************************************************************************** COMMAND LINE SUMMARY ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. [Part of this file has been deleted for brevity] -------------------------------------------------------------------------------- TGTGA[ACT][CAG][GT][AGT][GC][TAC]TCAC -------------------------------------------------------------------------------- Time 0.50 secs. ******************************************************************************** ******************************************************************************** SUMMARY OF MOTIFS ******************************************************************************** -------------------------------------------------------------------------------- Combined block diagrams: non-overlapping sites with p-value < 0.0001 -------------------------------------------------------------------------------- SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- ce1cg 1.74e-03 63_[+1(1.91e-05)]_27 ara 4.00e-03 57_[+1(4.41e-05)]_33 bglr1 7.85e-03 78_[+1(8.66e-05)]_12 crp 4.37e-03 65_[+1(4.81e-05)]_25 cya 3.66e-03 52_[+1(4.03e-05)]_38 deop2 5.47e-04 9_[+1(6.01e-06)]_81 gale 9.45e-04 26_[+1(1.04e-05)]_64 ilv 2.54e-02 105 lac 5.39e-05 11_[+1(5.92e-07)]_79 male 2.12e-04 16_[+1(2.33e-06)]_74 malk 1.35e-02 105 malt 2.55e-03 43_[+1(2.80e-05)]_47 ompa 1.42e-03 50_[+1(1.57e-05)]_40 tnaa 2.11e-03 73_[+1(2.32e-05)]_17 uxu1 3.35e-03 19_[+1(3.69e-05)]_71 pbr322 2.12e-04 55_[+1(2.33e-06)]_35 trn9cat 5.08e-02 105 tdc 1.57e-03 80_[+1(1.73e-05)]_10 -------------------------------------------------------------------------------- ******************************************************************************** ******************************************************************************** Stopped because nmotifs = 1 reached. ******************************************************************************** CPU: peterlenovo ******************************************************************************** |
<?xml version='1.0' encoding='UTF-8' standalone='yes'?> <!-- Document definition --> <!DOCTYPE MEME[ <!ELEMENT MEME ( training_set, model, motifs, scanned_sites_summary? )> <!ATTLIST MEME version CDATA #REQUIRED release CDATA #REQUIRED > <!-- Training-set elements --> <!ELEMENT training_set (alphabet, ambigs, sequence+, letter_frequencies)> <!ATTLIST training_set datafile CDATA #REQUIRED length CDATA #REQUIRED> <!ELEMENT alphabet (letter+)> <!ATTLIST alphabet id (amino-acid|nucleotide) #REQUIRED length CDATA #REQUIRED> <!ELEMENT ambigs (letter+)> <!ELEMENT letter EMPTY> <!ATTLIST letter id ID #REQUIRED> <!ATTLIST letter symbol CDATA #REQUIRED> <!ELEMENT sequence EMPTY> <!ATTLIST sequence id ID #REQUIRED name CDATA #REQUIRED length CDATA #REQUIRED weight CDATA #REQUIRED > <!ELEMENT letter_frequencies (alphabet_array)> <!-- Model elements --> <!ELEMENT model ( command_line, host, type, nmotifs, evalue_threshold, object_function, min_width, max_width, minic, wg, ws, endgaps, minsites, maxsites, wnsites, prob, spmap, [Part of this file has been deleted for brevity] <letter_ref letter_id="letter_G"/> <letter_ref letter_id="letter_T"/> <letter_ref letter_id="letter_T"/> <letter_ref letter_id="letter_G"/> <letter_ref letter_id="letter_A"/> <letter_ref letter_id="letter_T"/> <letter_ref letter_id="letter_C"/> <letter_ref letter_id="letter_G"/> <letter_ref letter_id="letter_G"/> </site> <right_flank>CACG</right_flank> </contributing_site> </contributing_sites> </motif> </motifs> <scanned_sites_summary p_thresh="0.0001"> <scanned_sites sequence_id="sequence_0" pvalue="1.74e-03" num_sites="1"><scanned_site motif_id="motif_1" strand="plus" position="63" pvalue="1.91e-05"/> </scanned_sites> <scanned_sites sequence_id="sequence_1" pvalue="4.00e-03" num_sites="1"><scanned_site motif_id="motif_1" strand="plus" position="57" pvalue="4.41e-05"/> </scanned_sites> <scanned_sites sequence_id="sequence_2" pvalue="7.85e-03" num_sites="1"><scanned_site motif_id="motif_1" strand="plus" position="78" pvalue="8.66e-05"/> </scanned_sites> <scanned_sites sequence_id="sequence_3" pvalue="4.37e-03" num_sites="1"><scanned_site motif_id="motif_1" strand="plus" position="65" pvalue="4.81e-05"/> </scanned_sites> <scanned_sites sequence_id="sequence_4" pvalue="3.66e-03" num_sites="1"><scanned_site motif_id="motif_1" strand="plus" position="52" pvalue="4.03e-05"/> </scanned_sites> <scanned_sites sequence_id="sequence_5" pvalue="5.47e-04" num_sites="1"><scanned_site motif_id="motif_1" strand="plus" position="9" pvalue="6.01e-06"/> </scanned_sites> <scanned_sites sequence_id="sequence_6" pvalue="9.45e-04" num_sites="1"><scanned_site motif_id="motif_1" strand="plus" position="26" pvalue="1.04e-05"/> </scanned_sites> <scanned_sites sequence_id="sequence_7" pvalue="2.54e-02" num_sites="0"></scanned_sites> <scanned_sites sequence_id="sequence_8" pvalue="5.39e-05" num_sites="1"><scanned_site motif_id="motif_1" strand="plus" position="11" pvalue="5.92e-07"/> </scanned_sites> <scanned_sites sequence_id="sequence_9" pvalue="2.12e-04" num_sites="1"><scanned_site motif_id="motif_1" strand="plus" position="16" pvalue="2.33e-06"/> </scanned_sites> <scanned_sites sequence_id="sequence_10" pvalue="1.35e-02" num_sites="0"></scanned_sites> <scanned_sites sequence_id="sequence_11" pvalue="2.55e-03" num_sites="1"><scanned_site motif_id="motif_1" strand="plus" position="43" pvalue="2.80e-05"/> </scanned_sites> <scanned_sites sequence_id="sequence_12" pvalue="1.42e-03" num_sites="1"><scanned_site motif_id="motif_1" strand="plus" position="50" pvalue="1.57e-05"/> </scanned_sites> <scanned_sites sequence_id="sequence_13" pvalue="2.11e-03" num_sites="1"><scanned_site motif_id="motif_1" strand="plus" position="73" pvalue="2.32e-05"/> </scanned_sites> <scanned_sites sequence_id="sequence_14" pvalue="3.35e-03" num_sites="1"><scanned_site motif_id="motif_1" strand="plus" position="19" pvalue="3.69e-05"/> </scanned_sites> <scanned_sites sequence_id="sequence_15" pvalue="2.12e-04" num_sites="1"><scanned_site motif_id="motif_1" strand="plus" position="55" pvalue="2.33e-06"/> </scanned_sites> <scanned_sites sequence_id="sequence_16" pvalue="5.08e-02" num_sites="0"></scanned_sites> <scanned_sites sequence_id="sequence_17" pvalue="1.57e-03" num_sites="1"><scanned_site motif_id="motif_1" strand="plus" position="80" pvalue="1.73e-05"/> </scanned_sites> </scanned_sites_summary> </MEME> |
The MEME results consist of:
-h : Use -help to get help information. -dna : EMBOSS will specify whether sequences use a DNA alphabet automatically. -protein : EMBOSS will specify whether sequences use a protein alphabet automatically.
The following additional options are provided:
outfile : Application output that was normally written to stdout.Note: ememe makes a temporary local copy of its input sequence data. You must ensure there is sufficient disk space for this in the directory that ememe is run.
WWW home: http://meme.sdsc.edu/meme/ Distribution: http://meme.nbcr.net/downloads/old_versions/Please read the file README in the the original MEMENEW package distribution for installation instructions.
set path=(/usr/local/meme/bin/ $path) rehash
meme > meme.txt mast > mast.txtto retrieve the meme and mast documentation into text files. The same documentation is given here and in the ememe documentation.
Please read the 'Notes' section below for a description of the differences between the original and EMBASSY MEME, particularly which application command line options are supported.
(MEME) Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994.
(MAST) Timothy L. Bailey and Michael Gribskov, "Combining evidence using p-values: application to sequence homology searches", Bioinformatics, Vol. 14, pp. 48-54, 1998.
The user must provide the full filename of a sequence database for the sequence input ("seqset" ACD option), not an indirect reference, e.g. a USA is NOT acceptable. This is because meme (which ememe wraps) does not support USAs, and a full sequence database is too big to write to a temporary file that the original meme would understand.
Program name | Description |
---|---|
antigenic | Find antigenic sites in proteins |
eiprscan | Motif detection |
elipop | Predict lipoproteins |
emast | Motif detection |
ememetext | Multiple EM for motif elicitation, text file only |
epestfind | Find PEST motifs as potential proteolytic cleavage sites |
fuzzpro | Search for patterns in protein sequences |
fuzztran | Search for patterns in protein sequences (translated) |
omeme | Motif detection |
patmatdb | Search protein sequences with a sequence motif |
patmatmotifs | Scan a protein sequence with motifs from the PROSITE database |
preg | Regular expression search of protein sequence(s) |
pscan | Scan protein sequence(s) with fingerprints from the PRINTS database |
sigcleave | Report on signal cleavage sites in a protein sequence |
Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.
This program is an EMBASSY wrapper to a program written by Timothy L. Bailey as part of his meme package.
Please report any bugs to the EMBOSS bug team in the first instance, not to Timothy L. Bailey.
None.