Thursday, July 5, 2012

phylocurve.perm: an R function for generating a rarefaction curve of Phylogenetic Diversity by randomisation

Here's a function I have written for the R statistical environment that calculates a rarefaction curve giving expected Phylogenetic Diversity (PD) for multiple values of sampling effort. Sampling effort can be defined in terms of the number of individuals, sites or species. This function is a variant of “phylocurve” that uses randomisation rather than an exact analytical formula to calculate an estimate of expected PD.  In addition, this version allows the calculation of estimates for the standard deviation, 95% confidence limits and minimum and maximum values of PD. The accuracy of these estimates is of course dependent on the number of permutations. I am providing it for free and without warranty under the GNU General Public License. You need to be familiar with R to use this function. The function also requires that the ape package be installed. To load the function, place the file in your working folder and type ‘source(“phylocurve_perm.R”)’.

WARNING: this function is much slower (about 10x to 100x) than “phylocurve”. For large datasets, I recommend testing with relatively large values of stepm and small values of perm to gauge the amount of computer time required.

Latest version: 17th August 2011. This version fixes a bug where rarefaction values were calculated incorrectly with community data tables containing species with total abundances of 0.

phylocurve.perm (x, phy, stepm=1, subsampling = “individual”, replacement = F, perm=1000)
x is a community data table (as in the vegan package) with species/OTUs as columns and samples/sites as rows. Columns are labelled with the names of the species/OTUs. Rows are labelled with the names of the samples/sites. Data can be either abundance or incidence (0/1).
phy is a rooted phylogenetic tree with branch lengths stored as a phylo object (as in the ape package) with terminal nodes labelled with names matching those of the community data table. Note that the function trims away any terminal taxa not present in the community data table, so it is not necessary to do this beforehand.
stepm is the size of the interval in a sequence of numbers of individuals, sites or species to which x is to be rarefied.
subsampling indicates whether the subsampling will be by “individual” (default), “site” or “species”. When there are multiple sites, rarefaction by individuals or species is done by first pooling the sites.
replacement is a logical indicating whether subsampling should be done with or without replacement.
perm is the number of iterations of the subsampling routine used to calculate expected PD.

phylocurve.perm takes a community data table and a rooted phylogenetic tree (with branch lengths) and calculates expected Phylogenetic Diversity (PD) for every specified value of m individuals, sites or species. m will range from 1 to the total number of individuals/sites/species in increments given by stepm. Estimates of expected PD as well as variance about that mean are determined by monte carlo randomisations (see Gotelli and Colwell 2001 for a more detailed explanation of the procedure as applied to species richness). When there are multiple sites in the community table and rarefaction is by individuals or species, sites are first pooled.
phylocurve.perm returns a matrix object of seven columns with each row corresponding to a particular value of m. The columns are: 1) the values of m; 2) the mean (expected) values of PD for all randomisations; 3) the standard deviation of PD for all randomisations; 4) the 0.025 quantile (lower 95% confidence limit); 5) the 0.975 quantile (upper 95% confidence limit); 6) the minimum PD; and 7) the maximum PD.
Gotelli and Colwell. 2001. Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecology Letters 4: 379–391.

No comments:

Post a Comment