PAUP* 4.0 beta3 -- Release Notes


MAJOR BUG FIXES:

- Fixed incorrect handling of user-supplied site-specific rates in
maximum-likelihood analysis (including rates set using a RateSet
command or by the "previous" option after estimating rates for a
character-partition). (David Maddison, Andrew Rambaut)
- Fixed crashing and other problems (including bogus tree lengths,
single-character step counts, and related fit measures) when the
option was set to treat multistate taxa as POLYMORPHIC, even if
there were no multistate taxa actually present. (Peter Fritsch,
Etsuko Moriyama, Helen James, Brian Wiegmann, Mike Sorenson,
Chris Schneider, Jackie Brown, Mark Siddall, Jim McGuire)
- Fixed erroneous length calculations on _nonbinary_ trees when
using user-defined stepmatrix characters with _non-integer_
costs. "Maximum" step counts were also underestimated, thus
affecting retention indices and rescaled consistency indices.
(Francois Lutzoni)
- Fixed possible incorrect calculation of likelihoods on huge
trees (e.g., more than 100 taxa) due to floating-point underflow
when single-site probabilities became too small to represent as
a standard double precision value. Now, likelihoods are
rescaled at intermittent points during tree traversal (when
necessary) to prevent the underflow from occurring. Thanks to
Ziheng Yang for suggestions on how to implement the rescaling.
(Matt Brauer, David Hillis, Paul Lewis)

NEW FEATURES and INTERFACE CHANGES:

- Trees may now be filtered according to the best tree score.
The command "Filter best;" (or the equivalent menu interface
choice) will retain only those trees that are optimal according
to the current criterion. This allows elimination of suboptimal
trees without having to first determine the score of the best
tree currently in memory, thereby simplifying some automated
analyses. (David Maddison)
- For models that allow rates across sites to be drawn from a
distribution (gamma and/or invariable-sites), the relative
contribution to the likelihood for a site from each rate
category is now available in conjunction with single-site
likelihood output ("Tree Scores: Likelihood" menu command,
"CategLikes" option on LScores command). This value is
expressed as a percentage, and it can be thought of as the
posterior probability that a site has been drawn from a given
rate category.
- Standard errors are now provided for maximum-likelihood branch
length estimates. Asymptotically, these branch-length estimates
are normally distributed, so usual methods can be used to
construct approximate confidence intervals (e.g., plus or minus
1.96 s.e. for a 95% confidence interval.)
- A likelihood-ratio test for whether a branch-length is zero is
now available, corresponding to the test implemented in DNAML
of PHYLIP. There are two versions of the test. The test is
done in conjection with tree description (DescribeTrees command)
The best test ("LSet ZeroLenTest=Full" or the equivalent choice
from the Likelihood Settings dialog box) optimizes all branch
lengths under the constraint that one of the branches is zero,
for each branch tested. It can be a little slow for large trees,
so an alternative test is available ("LSet ZeroLenTest=Crude")
in which the likelihood of the tree is determined after setting
each branch, in turn, to zero length, with no readjustment to
the other branches. This is the test actually implemented by
Joe Felsenstein in DNAML (who also used the word "crude" to
describe it in his documentation).
- You can now enter a one-line title to be included on each page
output by the "Print Trees" command of the Mac version. (Mark
Siddall, Mark Hershkovitz, Dave Rowell, and several others)
- You can now save trees to a file in "matrix representation."
A NEXUS file is created that contains a dummy character for
each clade on the tree; analysis of the resulting data set
under parsimony will regenerate the original tree. Optionally,
branch lengths can be stored as character weights; a weighted
parsimony analysis using these weights will regenerate the
original tree and branch lengths.
- A leading tilde (~) in file specifications under Unix is now
expanded into the current user's home directory ("~/...") or
the specified user's home directory ("~user/..."), consistent
with the behavior of many shells. (Sue Olson)
- An option was added to write user-supplied branch lengths with
trees saved to files. This mainly supports merging of treefiles
containing branch lengths using multiple 'GetTrees' commands.
Previously, no attempt was made to save user-supplied branch
length information when trees from a file were merged with trees
in memory.
- Previously, when a search was cancelled, any rearrangements that
had been recorded as potentially optimal, distinct, trees were
checked (and saved if appropriate) before control was returned
to the user. For large data sets or data sets with large
numbers of equally good trees, the delay while this check
progressed could be considerable. Now, the user is asked if
s/he wants to perform the check, and if the response is "no"
control is returned to the user immediately. If the response
is "yes" and a second abort request is issued, then the checking
stops immediately upon receipt of this second request.
- The "MaxPass", "SMaxPass", "Delta", "SDelta", and "LogIter"
options are no longer relevant for distance analysis and have
been eliminated entirely from the menu interface. Attempts to
set them from the command interface will not fail, but will
elicit a warning.
- The use of separate ML convergence and iteration-limit options
for searching vs. other contexts was confusing to users and has
been dropped. Now, there is one option for maximum number of
passes (MaxPass) and one option for the convergence criterion
(Delta). The old 'SMaxPass' and 'SDelta' options are now
mapped to MaxPass and Delta, respectively, in the command
interface. The menu/dialog interface no longer shows the
special "searching" versions of these options.
- Added options for outputting tree scores to a text file under
parsimony and distance criteria (previously this was available
only for likelihood). This makes it easy, for example, to use
standard spreadsheet and graphing applications to prepare
scatter plots showing correlations between tree scores obtained
under different optimality criteria.
- Added an option to delete trees from memory that are currently
hidden by a tree filter ("Filter Purge;" from the command line,
"Filter Trees:Purge Trees" from the Trees menu).
- Added an option to ignore character weights in likelihood and
DNA-specific distance analyses ("Wts=Ignore" vs.
"Wts=RepeatCnt" on DSET/LSET commands, with corresponding
items in the Distance Settings and Likelihood Settings dialog
boxes. Previously, integer weights were always being treated
as repeat counts.
- Macintosh input file selection dialogs ("Open", "Get Trees",
"Load Constraints", "Import") now have a pop-up menu for
selecting which files to list, and include an option to list all
files. This makes it possible to see files that are readable by
PAUP* but have not been assigned the filetype 'TEXT' (e.g.,
files transferred in binary rather than text mode from Unix
servers).
- Exporting of files in NEXUS format is now possible. This makes
it easy, for example, to convert files from interleaved to non-
interleaved or vice versa, or to create files that physically
delete rows and columns corresponding to deleted taxa and/or
excluded characters.
- A command-line interface to the "Export Data" command is now
available. This makes exporting available to the non-Macintosh
versions, and makes it possible to export data from a command
script.
- The default parameterization for the molecular-clock constraint
in likelihood branch-length optimization has been changed to
a method suggested by Andrew Rambaut. This version seems faster
for most data sets and is often more reliable. However, it is
clear that no parameterization strategy is 100% reliable, as
all are prone to getting stuck in local optima in some
situations. Other parameterizations are also available by using
the "ParamClock" option of the "LSet" (or LScores) command. The
parameterization used in previous versions can be requested as
ParamClock=BrLens. Other available settings are 'SplitTimes',
'Thorne', and 'MDRambaut'. All of the parameterizations other
than 'Rambaut' use multidimensional optimization of branch-
length parameters, rather than iteratively optimizating one at
a time ('MDRambaut' is simply a multimensional version of
'Rambaut'). These parameterizations will be described in the
forthcoming Users Manual.

OTHER FIXES/CHANGES since version 4.0b2a:

Calculation errors:

- Fixed problems with tree-length calculation in branch-and-bound
and exhaustive searches that followed a partition-homogeneity
analysis in which non-unit character weights were in effect.
Branch-and-bound searches typically reported "0 trees of length
0 or less" and exhaustive searches often crashed because of
memory corruption associated with the storage of the tree-length
frequency distribution. (Olaf Bininda-Emonds)
- Fixed problem with use of maximum-likelihood distances for
approximation of initial branch lengths for maximum-likelihood
analysis when rates were set to follow a gamma distribution.
(Sam Rogers)
- Fixed problem with stateset updating that could cause incorrect
tree length calculation with asymmetric stepmatrix characters
in unusual cases (probably only with NNI and SPR swapping).
- Fixed a problem where round-off error associated with noninteger
stepmatrix costs could cause trees that were actually equal in
length to be treated as having different length. One cascade
effect of this problem caused the program to stall while finding
a long succession of trees that were each trivially shorter than
the previously best tree, when there were a large number of
equally parsimonious trees. (Kevin de Queiroz)
- No longer stores and outputs undefined restriction-site
distances as "0". (Hendrik Schaefer)
- Maximum-likelihood distances were not being calculated correctly
when site-specific rates were in effect.
- Fixed a problem that could cause one or more base frequencies to
be estimated as zero under maximum likelihood. (Rod Page)
- Branch-and-bound searches under distance criteria could find
suboptimal trees if the option for handling negative branch
lengths was "set to zero" (the factory default) or "set to
absolute value". These options are not appropriate for branch-
and-bound searches because they can cause a search path leading
to the optimal tree to be cut off prematurely. Now, these
options for negative-branch-length handling are disallowed in
branch-and-bound searches. (Anne Yoder)
- Fixed incorrect calculation of base frequencies when ambiguity
codes or multistate taxa were present (including the "N" state,
which is mapped to {ACGT} rather than "missing"). The problem
was that these states were not included in the counts for the
number of occurrences of each base, but *were* being added to
the total number of sites with non-missing data. Thus, the sum
of the base frequencies for affected taxa was less than one.
The problem was serious only if N's or other ambiguities
represented a nontrivial fraction of the total data matrix.
Now, when a k-state ambiguity is found in the matrix, the count
for that site is distributed to each of the indicated states in
proportion to each states overall frequency in the taxon. (Jim
McGuire)

Problems that caused crashes or obviously incorrect results:

- Fixed crashing or other problems if the option to prohibit
negative branch lengths was requested with a heuristic search
under any distance criterion. (Daniel Dalevi, Andrew McArthur)
- Eliminated possible crash (on some systems) due to divide-by-
zero when using stepmatrices with zero off-diagonal costs.
(Mikael Thollesson, Bernard Goffinet/Randy Downer, Andrew
Mitchell)
- Eliminated crashes or other problems when non-nucleotide
characters were included for bootstrap and jackknife analyses
using likelihood or DNA-specific distance measures. (David
Geiser)
- Fixed memory allocation problem that could cause a crash during
calculation of agreement subtrees when option to find all
agreement sets was requested. (David Haasl)
- Eliminated crash when reading multiple treefiles containing
branch lengths (with the option to store these branch lengths
enabled). Previously, no attempt was made to save user-supplied
branch length information when trees from a file were merged
with trees in memory, but the crash occurred anyway with certain
modes. (Andrew Mitchell)
- Fixed crashing when a constrained search was performed, the
constraint was redefined using the same name, and then a second
constrained search was attempted. (Ziheng Yang)
- Fixed incorrect DNA distance calculation when option to
distribute ambiguous changes proportionally to unambiguous
changes (the default) was in effect and one or more of the four
bases was completely missing in a sequence (presumably, this
would never have happened with real data). (David Posada)
- Fixed crash or obviously incorrect output when "Describe Trees"
was used to output maximum-likelihood ancestral state
reconstructions (e.g, XOut=Internal or XOut=Both).
- Fixed failure to find a tree, or crash, when doing exhaustive
search under Goloboff's implied-weights criterion. (Jan
Bosselaers, Mikael Thollesson)
- Fixed a problem that could cause problems during bootstrapping,
jackknifing, or other random-number-intensive operations. The
problem only occurred with certain seed values, and either led
to a crash or a message that memory had been overwritten.
(Ron Debry)
- Fixed incorrect behavior if tree-sorting was requested while a
tree-filter was in effect. Now, this operation is simply
disallowed, and the user is prompted to either remove the filter
or purge the trees currently hidden by the filter. (David
Maddison)
- Fixed crashing of non-Mac versions when trees were described
with one or more Dollo parsimony characters in effect.
(John Barta)
- Fixed problems with minimum and maximum branch length
calculation under Dollo parsimony that could cause minimum
length to exceed maximum length in the branch length table.
- Fixed a problem that could cause a crash under the likelihood
criterion with least-squares estimation of initial branch lengths
(the crash did not occur until the second analysis in which
this option was used).
- Fixed crashing and/or incorrect calculation of tree lengths when
*asymmetric* stepmatrices contained internal inconsistencies
(especially involving infinite costs). Previously, these were
allowed to remain, but now the stepmatrix costs are adjusted to
eliminate the inconsistency. This may make some people unhappy,
but there simply is no way around it. I will try to provide a
more complete explanation in the manual. (Tobias Schneck)

Problems affecting input:

- Fixed problem that caused an alphabetic "missing" symbol (e.g.,
'n') not to be recognized if it did not match the case of the
character symbol of the first taxon in the data set for the same
character. (Sigrid Liede)

Interface and platform-specific issues:

- Fixed crash due to underflow when estimating gamma-shape
parameters using gcc-compiled binary on Alpha systems. (Jack
Dumbacher)
- Attempts were made to prevent multiple processes from writing to
the same file, to prevent other programs from modifying files
that PAUP is currently reading, or to prevent PAUP from reading
from files that have been opened for writing by another process.
On Unix systems, the POSIX fcntl-based record-locking mechanism
was used--this system relies on "advisory" rather than mandatory
locking and works only with programs that set and/or pay
attention to these locks. It also will not usually work across
NFS-mounted file systems. On the Mac, the FSpSetFLock facility
was used to lock files--unfortunately, this doesn't seem to
prevent some programs from writing to files that PAUP has
locked. At a minimum, multiple PAUP runs on the same machine
are now prevented from opening and writing to the same log
files, etc., since PAUP always respects its own locks. (Una
Smith)
- Fixed incorrect showing of filename in display output when trees
were saved to a treefile on a remote Appleshare volume (the name
of the enclosing directory was shown rather than the file name
itself). Also, the Finder information was not getting set
correctly, causing the saved file (which was named correctly)
to lose its association with PAUP. (Gary Olsen)
- Clicking the "Stop" button of command-line area now stops long
operations when it is enabled (e.g., when LSCORES calculation
is in progress). Clicking the "Pause" button is also allowed;
the calculation is suspended until "Resume" is clicked. (Ted
Schultz)
- Open-file dialog is now suppressed at startup if PAUP is
not the frontmost process when this dialog would be put up.
This eliminates hangs or crashes if, for example, PAUP is
launched from a script.
- Fixed sticking in "busy" mode if DScores output was requested
when there were no trees currently in memory (among other
problems, this left the "Quit" item on the file menu disabled,
and there was no way to get out of the program other than to
force-quit.
- Eliminated possible crash when a command-string was deleted
from the command history buffer (under specific, unusual
circumstances).
- Underscores are no longer translated to blanks on commands
passed to Unix, DOS, and VMS shells (e.g., "!ls paup_file").
(Will Fischer)
- Fixed possible problems opening files in extremely old MacOS
versions (pre-System-7).
- Loading of files with foreign line termination (Mac or Unix)
into the Win32 editor is now MUCH faster.
- Execution of files from an open editor window in the Unix
version is now vastly faster, and no longer crashes in some
cases. (David Campbell)

Superficial/cosmetic problems:

- Explicitly disallowed estimation of ML model parameters (other
than branch lengths) during maximum-likelihood quartet puzzling
(eliminates bogus error message about ML distances not being
allowed when parameters were estimated). (Irby Lovette)
- Fixed mispositioning of branch-length and scale-bar text when
a tree was printed across more than one page. (Frank Blattner,
Vincent Savolainen)
- Eliminated possible duplication of "ApproxLim dynamically
readjusted" message when an ML search was started from trees
already in memory. (Ted Schultz)
- Exhaustive search is now explicitly prohibited for more than 12
and 11 taxa on unrooted and rooted trees, respectively. This
was a practical limit in any case, but attempting it for more
than this number of taxa caused the number of possible trees to
exceed the number that could be stored in a 32-bit integer, with
various undesirable consequences. (Francois Genier)
- Fixed incorrect length of scale bar on plotted phylogram if the
tree was rooted along the longest branch of the tree (which
could only happen under nondefault rooting options). (Michael
Moeller)
- Eliminated bogus complaint about an invalid block pointer when a
"Quit" command was issued from an executing file (4d65 only).
(Torsten Eriksson)
- User is no longer allowed to request branch-swapping from trees
currently stored in memory in conjunction with bootstrap or
jackknife searches using command interface (was already disabled
in menu interface). (Robert Bellsey)
- Fixed failure to show parentheses or curly braces in "Show Data
Matrix" output with multistate characters treated as "variable".
(Una Smith)
- Fixed cosmetic problem in Macintosh-version "Parsimony Settings"
dialog box when running under System 7 and earlier. (Ted
Schultz)
- Treefiles saved in PHYLIP format now include the number of trees
as the first line. (Andrew McArthur)
- Eliminated bogus warnings about triangle-inequality violation
when a stepmatrix contains non-integer values. (Chris Schneider,
Daniel Miranda-Esquivel)
- Fixed unnecessary repetition of "Search aborted. There may be
a delay while cleanup is performed" message after a search was
cancelled (DOS/Unix versions only).
- An error message is now issued if a user attempts a search after
excluding all characters. I was going to make this just a
warning, but I couldn't imagine any scenario where a user would
want to do this (tell me if you know of one). (Olaf Bininda-
Emonds)
- User-supplied branch lengths are now updated correctly if taxa
are deleted with the option to prune them from trees currently
in memory. (Gary Olsen)
- No longer complains that character weights must be integers for
DNA-specific analyses if user just wants to print a phylogram
using user-specified branch lengths.
- Fixed cosmetic problem with output of single-character
consistency indices and related measures by Tree Scores:
Parsimony (PScores) command. (Mikael Thollesson)
- Fixed several related problems associated with Win32 version:
* Dragging a document icon to the PAUP application icon now
starts PAUP, with the file associated with the dropped icon
initially opened in PAUP's editor (or executed if the shift
key is pressed at the time of the drop). (Vijay Aswani)
* Double-clicking the application icon when PAUP is already
running now invokes a new instance of the program.
* Double-clicking a document icon with a file extension that
has been associated with PAUP now opens that file in the
editor if PAUP is already running.
* If an error is encountered while executing a file, the file
is now correctly opened in the editor and positioned to the
point of the error.
- The output message "Quartets evaluated using [approximate or
exact] likelihood calculations" in quartet puzzling is now
issued only when the current criterion is likelihood.
- Clarified error messages associated with conflicts between
settings for saving N best trees and other options.
- The warning about duplicated taxon names is now less noisy
when there are a large number of duplicated names.
- MEGA file import now allows "#TITLE" as well as "TITLE" (some
MEGA files seem to contain the pound sign, even though the
original specification does not call for it).
- Non-protein GCG files are now imported as "nucleotide" format
rather than worrying about whether they are DNA or RNA (or
both).
- Eliminated possible appearance of Stop/Resume/Cancel buttons
in output-display area under when the command-line was hidden.
- No longer complains erroneously that data are not presence-
absence when output of a restriction-site distance is
requested from the command-line before any other actions have
been taken.
- Fixed failure to update maximum-likelihood score after adjusting
the very last branch-length parameter on a tree. The only
effect that this would have in the vast majority of cases is
that convergence might require one more pass than it should
have, slowing the program down a little. In very rare (probably
only hypothetical) cases, it could have caused premature
convergence (this would only happen if all branch-lengths other
than the last one did not change in a pass over the tree).
- Running status output ("RStatus") is now suppressed if it is the
current setting for random-addition-sequence searches, but the
searches currently being done are for a "fast" bootstrap or
jackknife. (David Maddison)
- Changed the message issued when an editor window was reactivated
after a file was opened for editing and then moved or deleted
from the Finder (previously, it said that the file had been
changed, when in fact the problem was that the disk version of
the file could not be located). The file is also now marked
as a "new" file when this happens. (David Maddison)
- Attempted to fix some anomalies in the search-status window
display with random-addition-sequence heuristic searches.
(David Maddison)
- Fixed failure of input parser to consume rest of LScores command
after outputting a message that there are no trees in memory
for which to calculate likelihood scores. If the command issued
was "LScores all;" this caused an exhaustive search to be
initiated (the "all" was taken as an abbreviation of "AllTrees".
(David Maddison)
- Windows version now updates search-status window promptly after
restoring from minimized position (no longer briefly shows "0"
as number of trees remaining to swap, etc.). (David Campbell)
- User attempts to enter integer values longer than the maximum
allowable value (= 2,147,483,647) are now trapped.

Other minor enhancements and changes:

- A neighbor-joining or UPGMA tree is now calculated and used to
set the initial upper-bound for branch-and-bound searches under
likelihood and distance criteria (unless user supplies an upper
bound). This change can greatly speed branch-and-bound search
times for distance and likelihood relative to the previous
default of starting with an infinite upper bound.
- UPGMA output with the "branch lengths" option now outputs the
clustering level for each cluster (internal node).
- All branch lengths (including terminals) are now allowed to go
to zero in maximum-likelihood optimization if they want to.
Previously, terminal branch lengths were not allowed to be
smaller than 10^(-8), which eliminated the possibility of
zero likelihoods and resulting undefined log likelihoods when
taxa with nonidentical sequences were connected by a path of
zero length. However, this limit was preventing likelihoods
from reaching their optimal values in certain unusual situations
not likely to be encountered with real data. I have attempted
to trap all possible negative consequences of this change, but
it's possible that it will introduce new bugs.
- Format for scores file from Likelihood Scores (LScores) was
changed slightly to improve readability and facilitate
use in spreadsheet and graphing applications.
- Added notification (warning) that character weights are
treated as repeat counts in maximum likelihood analysis, and
summary of current weight status.
- User is now prompted if a constrained search is attempted with
an uninformative constraint tree. (This was needed because
some users were unaware that that the constraint tree that
they had specified was uninformative, particularly when a
constraint tree was originally informative but became
uninformative due to deletion of taxa.)
- The model with six substitution types and equal base frequencies
is now identified as the SYM model in likelihood-settings
output. (Andrew McArthur)
- Choosing an item from the "Open Recent" menu while holding
down the option key now causes that item to be deleted from
the menu. (Otherwise, the file is executed if the shift key
is down, or edited if the shift key is not down.)
- Sum of branch lengths is now shown in the "Describe Trees"
branch-length table. For parsimony, this is the same as the
tree length (except for length due to terminal polymorphisms),
but this value is useful in likelihood and distance contexts as
well.
- Output of estimated "R" matrix (= symmetric component of rate
matrix Q) for GTR model no longer shows values on diagonal,
which are not meaningful.
- Changed system for numbering internal nodes when taxa are
deleted to avoid overlap between numbers in the range between
the current number of taxa and the original number of taxa.
- Importing of simple-text files no longer requires that taxon
names start in the first position of each line (i.e., there
can be whitespace before the name, such as when taxon names
are right-justified).



Home | About PAUP* | To Order | Versions | Support | FAQ
Tech exchange | Known problems | Mailing list | Downloads