ffactor |
Please help by correcting and extending the Wiki pages.
Note that this program has no way of converting an unordered multistate character into binary characters. Fortunately, PARS has joined the package, and it enables unordered multistate characters, in which any state can change to any other in one step, to be analyzed with parsimony.
FACTOR is really for a different case, that in which there are multiple states related on a "character state tree", which specifies for each state which other states it can change to. That graph of states is assumed to be a tree, with no loops in it.
% ffactor Multistate to binary recoding program Phylip factor program input file: factor.dat Phylip factor program output file [factor.ffactor]: Data matrix written on file "factor.ffactor" Done. |
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers: [-infile] infile Phylip factor program input file [-outfile] outfile [*.ffactor] Phylip factor program output file Additional (Optional) qualifiers: -anc boolean [N] Put ancestral states in output file -factors boolean [N] Put factors information in output file -[no]progress boolean [Y] Print indications of progress of run Advanced (Unprompted) qualifiers: -outfactorfile outfile [*.ffactor] Phylip factor data output file (optional) -outancfile outfile [*.ffactor] Phylip ancestor data output file (optional) Associated qualifiers: "-outfile" associated qualifiers -odirectory2 string Output directory "-outfactorfile" associated qualifiers -odirectory string Output directory "-outancfile" associated qualifiers -odirectory string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages |
Standard (Mandatory) qualifiers | Allowed values | Default | |
---|---|---|---|
[-infile] (Parameter 1) |
Phylip factor program input file | Input file | Required |
[-outfile] (Parameter 2) |
Phylip factor program output file | Output file | <*>.ffactor |
Additional (Optional) qualifiers | Allowed values | Default | |
-anc | Put ancestral states in output file | Boolean value Yes/No | No |
-factors | Put factors information in output file | Boolean value Yes/No | No |
-[no]progress | Print indications of progress of run | Boolean value Yes/No | Yes |
Advanced (Unprompted) qualifiers | Allowed values | Default | |
-outfactorfile | Phylip factor data output file (optional) | Output file | <*>.ffactor |
-outancfile | Phylip ancestor data output file (optional) | Output file | <*>.ffactor |
This program factors a data set that contains multistate characters, creating a data set consisting entirely of binary (0,1) characters that, in turn, can be used as input to any of the other discrete character programs in this package, except for PARS. Besides this primary function, FACTOR also provides an easy way of deleting characters from a data set. The input format for FACTOR is very similar to the input format for the other discrete character programs except for the addition of character-state tree descriptions.
Note that this program has no way of converting an unordered multistate character into binary characters. Fortunately, PARS has joined the package, and it enables unordered multistate characters, in which any state can change to any other in one step, to be analyzed with parsimony.
FACTOR is really for a different case, that in which there are multiple states related on a "character state tree", which specifies for each state which other states it can change to. That graph of states is assumed to be a tree, with no loops in it.
The first line of the input file should contain the number of species and the number of multistate characters. This first line is followed by the lines describing the character-state trees, one description per line. The species information constitutes the last part of the file. Any number of lines may be used for a single species.
The first line is free format with the number of species first, separated by at least one blank (space) from the number of multistate characters, which in turn is separated by at least one blank from the options, if present.
The character-state trees are described in free format. The character number of the multistate character is given first followed by the description of the tree itself. Each description must be completed on a single line. Each character that is to be factored must have a description, and the characters must be described in the order that they occur in the input, that is, in numerical order.
The tree is described by listing the pairs of character states that are adjacent to each other in the character-state tree. The two character states in each adjacent pair are separated by a colon (":"). If character fifteen has this character state tree for possible states "A", "B", "C", and "D":
A ---- B ---- C | | | D
then the character-state tree description would be
15 A:B B:C D:B
Note that either symbol may appear first. The ancestral state is identified, if desired, by putting it "adjacent" to a period. If we wanted to root character fifteen at state C:
A <--- B <--- C | | V D
we could write
15 B:D A:B C:B .:C
Both the order in which the pairs are listed and the order of the symbols in each pair are arbitrary. However, each pair may only appear once in the list. Any symbols may be used for a character state in the input except the character that signals the connection between two states (in the distribution copy this is set to ":"), ".", and, of course, a blank. Blanks are ignored completely in the tree description so that even B:DA:BC:B.:C or B : DA : BC : B. : C would be equivalent to the above example. However, at least one blank must separate the character number from the tree description.
If no description line appears in the input for a particular character, then that character will be omitted from the output. If the character number is given on the line, but no character-state tree is provided, then the symbol for the character in the input will be copied directly to the output without change. This is useful for characters that are already coded "0" and "1". Characters can be deleted from a data set simply by listing only those that are to appear in the output.
The format for the species information is basically identical to the other discrete character programs. The first ten character positions are allotted to the species name (this value may be changed by altering the value of the constant nmlngth at the beginning of the program). The character states follow and may be continued to as many lines as desired. There is no current method for indicating polymorphisms. It is possible to either put blanks between characters or not.
There is a method for indicating uncertainty about states. There is one character value that stands for "unknown". If this appears in the input data then "?" is written out in all the corresponding positions in the output file. The character value that designates "unknown" is given in the constant unkchar at the beginning of the program, and can be changed by changing that constant. It is set to "?" in the distribution copy.
4 6 1 A:B B:C 2 A:B B:. 4 5 0:1 1:2 .:0 6 .:# #:$ #:% 999 Alpha CAW00# Beta BBX01% Gamma ABY12# Epsilon CAZ01$ |
In fact, the format of the output file for the A and F options is not correct for the current release of PHYLIP. We need to change their output to write a factors file and an ancestors file instead of putting the Factors and Ancestors information into the data file.
4 5 Alpha CA00# Beta BB01% Gamma AB12# Epsilon CA01$ |
|
|
The output should be checked for error messages. Errors will occur in the character-state tree descriptions if the format is incorrect (colons in the wrong place, etc.), if more than one root is specified, if the tree contains loops (and hence is not a tree), and if the tree is not connected, e.g.
A:B B:C D:E
describes
A ---- B ---- C D ---- E
This "tree" is in two unconnected pieces. An error will also occur if a symbol appears in the data set that is not in the tree description for that character. Blanks at the end of lines when the species information is continued to a new line will cause this kind of error.
Program name | Description |
---|---|
eclique | Largest clique program |
edollop | Dollo and polymorphism parsimony algorithm |
edolpenny | Penny algorithm Dollo or polymorphism |
efactor | Multistate to binary recoding program |
emix | Mixed parsimony algorithm |
epenny | Penny algorithm, branch-and-bound |
fclique | Largest clique program |
fdollop | Dollo and polymorphism parsimony algorithm |
fdolpenny | Penny algorithm Dollo or polymorphism |
fmix | Mixed parsimony algorithm |
fmove | Interactive mixed method parsimony |
fpars | Discrete character parsimony |
fpenny | Penny algorithm, branch-and-bound |
Although we take every care to ensure that the results of the EMBOSS version are identical to those from the original package, we recommend that you check your inputs give the same results in both versions before publication.
Please report all bugs in the EMBOSS version to the EMBOSS bug team, not to the original author.
Converted (August 2004) to an EMBASSY program by the EMBOSS team.