fneighbor |
Please help by correcting and extending the Wiki pages.
NEIGHBOR constructs a tree by successive clustering of lineages, setting branch lengths as the lineages join. The tree is not rearranged thereafter. The tree does not assume an evolutionary clock, so that it is in effect an unrooted tree. It should be somewhat similar to the tree obtained by FITCH. The program cannot evaluate a User tree, nor can it prevent branch lengths from becoming negative. However the algorithm is far faster than FITCH or KITSCH. This will make it particularly effective in their place for large studies or for bootstrap or jackknife resampling studies which require runs on multiple data sets.
The UPGMA option constructs a tree by successive (agglomerative) clustering using an average-linkage method of clustering. It has some relationship to KITSCH, in that when the tree topology turns out the same, the branch lengths with UPGMA will turn out to be the same as with the P = 0 option of KITSCH.
The programs FITCH, KITSCH, and NEIGHBOR are for dealing with data which comes in the form of a matrix of pairwise distances between all pairs of taxa, such as distances based on molecular sequence data, gene frequency genetic distances, amounts of DNA hybridization, or immunological distances. In analyzing these data, distance matrix programs implicitly assume that:
These assumptions can be traced in the least squares methods of programs FITCH and KITSCH but it is not quite so easy to see them in operation in the Neighbor-Joining method of NEIGHBOR, where the independence assumptions is less obvious.
THESE TWO ASSUMPTIONS ARE DUBIOUS IN MOST CASES: independence will not be expected to be true in most kinds of data, such as genetic distances from gene frequency data. For genetic distance data in which pure genetic drift without mutation can be assumed to be the mechanism of change CONTML may be more appropriate. However, FITCH, KITSCH, and NEIGHBOR will not give positively misleading results (they will not make a statistically inconsistent estimate) provided that additivity holds, which it will if the distance is computed from the original data by a method which corrects for reversals and parallelisms in evolution. If additivity is not expected to hold, problems are more severe. A short discussion of these matters will be found in a review article of mine (1984a). For detailed, if sometimes irrelevant, controversy see the papers by Farris (1981, 1985, 1986) and myself (1986, 1988b).
For genetic distances from gene frequencies, FITCH, KITSCH, and NEIGHBOR may be appropriate if a neutral mutation model can be assumed and Nei's genetic distance is used, or if pure drift can be assumed and either Cavalli-Sforza's chord measure or Reynolds, Weir, and Cockerham's (1983) genetic distance is used. However, in the latter case (pure drift) CONTML should be better.
Restriction site and restriction fragment data can be treated by distance matrix methods if a distance such as that of Nei and Li (1979) is used. Distances of this sort can be computed in PHYLIp by the program RESTDIST.
For nucleic acid sequences, the distances computed in DNADIST allow correction for multiple hits (in different ways) and should allow one to analyse the data under the presumption of additivity. In all of these cases independence will not be expected to hold. DNA hybridization and immunological distances may be additive and independent if transformed properly and if (and only if) the standards against which each value is measured are independent. (This is rarely exactly true).
FITCH and the Neighbor-Joining option of NEIGHBOR fit a tree which has the branch lengths unconstrained. KITSCH and the UPGMA option of NEIGHBOR, by contrast, assume that an "evolutionary clock" is valid, according to which the true branch lengths from the root of the tree to each tip are the same: the expected amount of evolution in any lineage is proportional to elapsed time.
% fneighbor Phylogenies from distance matrix by N-J or UPGMA method Phylip distance matrix file: neighbor.dat Phylip neighbor program output file [neighbor.fneighbor]: Cycle 4: species 1 ( 0.91769) joins species 2 ( 0.76891) Cycle 3: node 1 ( 0.42027) joins species 3 ( 0.35793) Cycle 2: species 6 ( 0.15168) joins species 7 ( 0.11752) Cycle 1: node 1 ( 0.04648) joins species 4 ( 0.28469) last cycle: node 1 ( 0.02696) joins species 5 ( 0.15393) joins node 6 ( 0.03982) Output written on file "neighbor.fneighbor" Tree written on file "neighbor.treefile" Done. |
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers: [-datafile] distances Phylip distance matrix file [-outfile] outfile [*.fneighbor] Phylip neighbor program output file Additional (Optional) qualifiers (* if not always prompted): -matrixtype menu [s] Type of data matrix (Values: s (Square); u (Upper triangular); l (Lower triangular)) -treetype menu [n] Neighbor-joining or UPGMA tree (Values: n (Neighbor-joining); u (UPGMA)) * -outgrno integer [0] Species number to use as outgroup (Integer 0 or more) -jumble toggle [N] Randomise input order of species * -seed integer [1] Random number seed between 1 and 32767 (must be odd) (Integer from 1 to 32767) -replicates boolean [N] Subreplicates -[no]trout toggle [Y] Write out trees to tree file * -outtreefile outfile [*.fneighbor] Phylip tree output file (optional) -printdata boolean [N] Print data at start of run -[no]progress boolean [Y] Print indications of progress of run -[no]treeprint boolean [Y] Print out tree Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-outfile" associated qualifiers -odirectory2 string Output directory "-outtreefile" associated qualifiers -odirectory string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages |
Standard (Mandatory) qualifiers | Allowed values | Default | |||||||
---|---|---|---|---|---|---|---|---|---|
[-datafile] (Parameter 1) |
Phylip distance matrix file | Distance matrix | |||||||
[-outfile] (Parameter 2) |
Phylip neighbor program output file | Output file | <*>.fneighbor | ||||||
Additional (Optional) qualifiers | Allowed values | Default | |||||||
-matrixtype | Type of data matrix |
|
s | ||||||
-treetype | Neighbor-joining or UPGMA tree |
|
n | ||||||
-outgrno | Species number to use as outgroup | Integer 0 or more | 0 | ||||||
-jumble | Randomise input order of species | Toggle value Yes/No | No | ||||||
-seed | Random number seed between 1 and 32767 (must be odd) | Integer from 1 to 32767 | 1 | ||||||
-replicates | Subreplicates | Boolean value Yes/No | No | ||||||
-[no]trout | Write out trees to tree file | Toggle value Yes/No | Yes | ||||||
-outtreefile | Phylip tree output file (optional) | Output file | <*>.fneighbor | ||||||
-printdata | Print data at start of run | Boolean value Yes/No | No | ||||||
-[no]progress | Print indications of progress of run | Boolean value Yes/No | Yes | ||||||
-[no]treeprint | Print out tree | Boolean value Yes/No | Yes | ||||||
Advanced (Unprompted) qualifiers | Allowed values | Default | |||||||
(none) |
7 Bovine 0.0000 1.6866 1.7198 1.6606 1.5243 1.6043 1.5905 Mouse 1.6866 0.0000 1.5232 1.4841 1.4465 1.4389 1.4629 Gibbon 1.7198 1.5232 0.0000 0.7115 0.5958 0.6179 0.5583 Orang 1.6606 1.4841 0.7115 0.0000 0.4631 0.5061 0.4710 Gorilla 1.5243 1.4465 0.5958 0.4631 0.0000 0.3484 0.3083 Chimp 1.6043 1.4389 0.6179 0.5061 0.3484 0.0000 0.2692 Human 1.5905 1.4629 0.5583 0.4710 0.3083 0.2692 0.0000 |
As NEIGHBOR runs it prints out an account of the successive clustering levels, if you allow it to. This is mostly for reassurance and can be suppressed using menu option 2. In this printout of cluster levels the word "OTU" refers to a tip species, and the word "NODE" to an interior node of the resulting tree.
Neighbor-Joining/UPGMA method version 3.68 7 Populations Neighbor-joining method Negative branch lengths allowed +---------------------------------------------Mouse ! ! +---------------------Gibbon 1------------------------2 ! ! +----------------Orang ! +--4 ! ! +--------Gorilla ! +-5 ! ! +--------Chimp ! +-3 ! +------Human ! +------------------------------------------------------Bovine remember: this is an unrooted tree! Between And Length ------- --- ------ 1 Mouse 0.76891 1 2 0.42027 2 Gibbon 0.35793 2 4 0.04648 4 Orang 0.28469 4 5 0.02696 5 Gorilla 0.15393 5 3 0.03982 3 Chimp 0.15168 3 Human 0.11752 1 Bovine 0.91769 |
(Mouse:0.76891,(Gibbon:0.35793,(Orang:0.28469,(Gorilla:0.15393, (Chimp:0.15168,Human:0.11752):0.03982):0.02696):0.04648):0.42027,Bovine:0.91769); |
Program name | Description |
---|---|
efitch | Fitch-Margoliash and Least-Squares Distance Methods |
ekitsch | Fitch-Margoliash method with contemporary tips |
eneighbor | Phylogenies from distance matrix by N-J or UPGMA method |
ffitch | Fitch-Margoliash and Least-Squares Distance Methods |
fkitsch | Fitch-Margoliash method with contemporary tips |
Although we take every care to ensure that the results of the EMBOSS version are identical to those from the original package, we recommend that you check your inputs give the same results in both versions before publication.
Please report all bugs in the EMBOSS version to the EMBOSS bug team, not to the original author.
Converted (August 2004) to an EMBASSY program by the EMBOSS team.