Strong phylogenetic signals in global plant bioclimatic envelopes

Aim: The environmental preferences of species are an important facet of their response to changing conditions, and these have long been thought to exhibit phylogenetic conservatism. However, these bioclimatic envelopes have not previously been imputed from climate records at the date and location of occurrence, and the strength of their phylogenetic signal has not been studied at a broad scale. Here, we combine records from global climate reconstructions with contemporaneous plant occurrences for all available terrestrial plant species and test for phylogenetic niche conservatism


| INTRODUC TI ON
The United Nations Framework Convention on Climate Change (UNFCC) Paris Agreement was ratified in 2015 in an effort to keep global temperature level rise well below 2°C above pre-industrial levels, with an ultimate goal of 1.5°C (UNFCCC, 2015).In 2018, the Intergovernmental Panel on Climate Change (IPCC) reported that we were likely to reach 1.5°C in the next few decades , with an estimated increase of c. 0.2°C per decade (Intergovernmental Panel on Climate Change, 2018).The impacts of such climate change are already being observed (Cheng et al., 2019;EASAC, 2018;Marvel et al., 2019;Slater et al., 2020) and set to increase in severity (Intergovernmental Panel on Climate Change, 2018), especially if mitigation measures against CO 2 emissions are insufficient (Intergovernmental Panel on Climate Change, 2018).There is evidence that historical and current CO 2 emissions are tracking representative concentration pathway (RCP) 8.5, the most extreme in terms of fossil fuel use of all RCPs used in climate modelling (Schwalm et al., 2020).Under the RCP8.5 scenario, global temperatures would be likely to increase in the range of 2.6-4.8°Cabove pre-industrial levels by 2100 (IPCC, 2014).However, according to the United Nations Environment Programme, even if all of the action plans put in place as a result of the Paris Agreement were followed, we could still see a rise of 3.2°C by the end of the century (Olhoff & Christensen, 2019).
Heat waves, droughts and extreme weather events, such as cyclones, are all expected to increase in frequency over the coming years and are thought to have a particular impact on the survival and diversity of plant communities (Jentsch & Beierkuhnlein, 2008;Reyer et al., 2013).These climate extremes, along with rising sea levels, are expected to contribute towards the loss of climatically suitable range for species in the future (Nunez et al., 2019;Warren et al., 2018).Some studies already report widespread climatechange-induced local extinctions in animals and plants, especially amongst tropical and subtropical regions when compared with temperate regions (Wiens, 2016).In a study of c. 70,000 plant species, 16% were estimated to lose >50% of their range if warming were to reach 2°C by 2100, compared with 8% if it were restricted to the lower 1.5°C scenario (Warren et al., 2018).These predictions paint a bleak picture for the future of plant biodiversity, particularly when an estimated fifth of assessed plant species were already classified as threatened with extinction and new reports suggest this figure could be as high as 40% (Antonelli et al., 2020;Brummitt et al., 2015).Key to understanding the responses of species to climate change are their historical and current relationships with climate.According to traditional ecological theory, species are adapted for a particular set of environmental conditions, occupying a unique niche within the ecosystem (Hutchinson, 1957).The climate makes up a large part of this fundamental niche, the bioclimatic envelope, which is often modelled separately from other environmental variables, such as land cover or soil type, and from the biotic interactions that can limit ranges (Pearson & Dawson, 2003).Apart from the bioclimatic envelopes produced by species distribution models (SDMs), there are few other sources of information for plant climatic envelopes.Although physiological information on the climatic tolerances of species can be inferred through experiments, it is impractical to do so for the vast majority of plants on Earth (Araújo & Peterson, 2012).There is some information on plant species traits through databases, such as TRY (Kattge et al., 2011;Weigelt et al., 2019) or the Global Root Traits database (GRooT; Guerrero-Ramírez et al., 2020), but little relates to climatic variables, and what is available is either predominately categorical data or taxonomically or geographically limited.
Given that it is widely established that climate is one of the biggest drivers of plant morphology and function, it has long been posited that the climate range of plants can be predicted through using information on plant functional traits (Box, 1996;Stahl et al., 2014;Woodward & Williams, 1987).Traits such as specific leaf area and woodiness are much more readily available than direct information on bioclimatic envelopes, but there is little consensus beyond small case studies on how to relate these traits to climate tolerance (Stahl et al., 2014).
Along with plant functional trait information, there is increasing incorporation of evolutionary history into the prediction of plant climate ranges.Although there is some evidence of a phylogenetic signal in climate variables, there are equally many discrepancies (Zhang et al., 2017).This has been hampered by a lack of resolved, global phylogenies and accompanying data on the climatic envelopes of species.It has been hypothesized that such climatic niches are phylogenetically conserved, a concept known as phylogenetic niche conservatism (PNC; Harvey & Pagel, 1991).There are a number of different factors that could contribute to a pattern of PNC, including evolutionary constraints on physiology and processes such as neutral drift and dispersal limitation (Crisp & Cook, 2012).Therefore, the mechanism by which we measure PNC is confounded by the number of different processes that could give rise to such a pattern (Crisp & Cook, 2012;Losos, 2011).In many studies, measurements of phylogenetic signal (PS), such as Blomberg's K (Blomberg et al., 2003) or Pagel's λ (Pagel, 1999), are also used to detect PNC, although there is still some debate about the link between PNC and phylogenetic signal.
Here, we extract information on the climate experienced by >200,000 plant species recorded in the Global Biodiversity Information Facility (GBIF) from long-term climate re-analysis datasets created by the European Centre for Medium-Range Weather Forecasting (ECMWF), in addition to climate normals (averaged for a given month over a 30 year period) created by WorldClim.We then link these bioclimatic envelopes to a supertree of >30,000 plant species Qian and Jin (2016) to detect phylogenetic signals in climatic tolerances for the full tree and the top 5,000 most common species in GBIF.Given that most species in GBIF are under-recorded, we also correct for these biases and test the capacity of the models to predict bioclimatic envelopes for missing data.

| Data
We extracted the entire history of plant occurrences from the GBIF (GBIF, 2019), which totals c. 212 million individual records.
The data were filtered for species with an accepted name in The Plant List (TPL, 2013), in order to remove marine species and fossils, which left c. 215,000 species and slightly >100 million records.
The data were additionally cleaned for the presence of 3,441 botanical garden locations with available GPS coordinates, taken from Botanic Gardens Conservation International (BGCI, 2019), to eliminate records erroneously georeferenced to the locations at which the collections are stored.A buffer of 0.02 decimal degrees was taken around each botanical garden, and all points that lay within this area were excluded.The same procedure was performed for country centroids, which can be assigned erroneously to records during the georeferencing process if no locality information is available.We also explored filtering this data further for the top 100 recording institutions, excluding those likely to include garden or greenhouse observations.Given that we found comparable results, we retain the original data in the study (for the small differences observed, see Supporting Information Figure S1; Table S1).GBIF also makes no distinction between native and alien records in their data.However, given that we are interested in the climate envelopes that these species could tolerate, rather than their native ranges, information from introductions to other climates is very valuable to this end because it indicates their survival in new environments.
We downloaded historical climate reconstructions from the ECMWF for two re-analysis datasets: CERA-20C and ERA-Interim (Dee et al., 2011;Laloyaux et al., 2018).We used the aggregated monthly datasets made available through the ECMWF, which are means of these 3 h records, for 11 climatic variables, including temperature, precipitation and solar radiation (see Table 1).We replaced the final 39 years  with the more spatially resolved ERA-Interim for the same variables to create a complete time line for 1901-2018.Finally, we compared this with the WorldClim dataset (Fick & Hijmans, 2017).In contrast to the ECMWF data, WorldClim is much less temporally resolved and is averaged over a 30-year time period from 1970 to 2000 for every month in the year.
For computational efficiency, we used a grid size of 10 min of arc, or c. 18 km at the equator (ranging in size down to 1 km at the poles).
See Table 2 for an overview of the temporal and spatial resolutions of the datasets.
In order to correct for biases in plant occurrence data, we also used mean global-level enhanced vegetation index (EVI) as a proxy of plant density across the period 2001-2015 (Huete et al., 1999).
The data were originally derived from the US National Aeronautics and Space Administration (NASA) Moderate Resolution Imaging Spectrometer (MODIS) dataset, with processing and gap filling in cloud cover performed by Gibson and Weiss (2015).We excluded Antarctica and all islands within the Arctic circle, including Greenland, owing to poor data quality.

| Data extraction and bias correction
Using the c. 215,000 plant species we obtained from GBIF and the climate reconstructions from ECMWF (and other sources), we extracted the plant bioclimatic envelopes using the timing and location of occurrences.This created a raw dataset of global bioclimatic envelopes, and we also explored several adjustments to account for biases in historical collection practices.We present the full compiled database as an output of this work.

| Raw data
For extracting profiles from ERA-Interim and CERA-20C data, we assumed that each of the individual plants recorded in GBIF, or a parent in the case of annuals, would have been present at the site for ≥1 year before collection.Thus, for every record available in the filtered GBIF database, we extracted each of the variables in Table 1 ("climate variables") at the month and location in which they were recorded and the preceding 11 months as a profile of climates in which the species could survive.We averaged across the 12 months for every record and for each climate variable, with the exception of temperature, for which we also calculated minimum and maximum values (Table 1, "derived variables").For WorldClim data, we extracted the entire 12 months of averaged data at the location where the plants were found.We then binned all records per species into 1,000 bins for each climate variable to produce a set of plant bioclimatic envelope profiles, to which we could then apply various corrections for bias in the underlying GBIF data.

| Adjusted data
Plant records available in GBIF suffer from extreme spatial and taxonomic biases (Figure 1).In particular, there is a concentration of higher recording effort in Western Europe, Eastern America and Australia, and recording is particularly low throughout the tropics, where there is the highest concentration of rare and endangered species.In order to control for such biases in the raw GBIF data, we performed several corrections to reduce bias in the corresponding plant species bioclimatic envelopes.The first was to recover the true abundance of plant species at different levels of climate variables.In order to do this, we weighted the binned species preference profiles by the ratio of global distribution of plant collections against a proxy of plant abundance, the EVI, to reduce observation bias (hereafter "effort adjustment"; Supporting Information Figure S2b).The EVI has improved sensitivity to regions of high plant biomass in comparison to the more commonly used normalized difference vegetation index (NDVI) (Huete et al., 1999).The second correction was to recover species envelopes in an idealized landscape where all climates were equally available, by correcting for the observed biases in the global climate (hereafter "climate adjustment"; Supporting Information Figure S2c).This followed the observation that the climates experienced by plant species were strongly bimodal in some parameters, particularly temperature, because there were excess land areas with low (around freezing) and high (around 26°C) average monthly temperatures (Supporting Information Figure S3).Finally, to prepare the data for input into the phylogenetic models, we averaged across the climate bins, weighting by the number of records, to obtain an overall value per species for each climate variable and correction type.
This resulted in three datasets: the raw averaged bioclimatic envelope per species; the bioclimatic envelope adjusted for effort; and the bioclimatic envelope adjusted for both effort and climate.
There is large variation in the climate variables and levels of plant data collection between different parts of the world, with a particular bias towards Western Europe (Figure 1).Making adjustments at the level of the individual grid square or as a spatial kernel is problematic, because there are large parts of the world for which there are no or very few plant records, which would need enormous corrections to achieve European sampling levels.We therefore also divided the data into six continental-scale areas [henceforth continents; South America, North America, Africa, Europe (excluding Russia), Asia (including Russia) and Australasia] and applied the adjustments separately to each.Thus, we have both a global and continental scale for each analysis.

| Phylogeny and estimation of evolutionary signal
There is much debate in the literature over the prevalence of

| Full tree and subtrees
Between the tree and bioclimatic envelopes, there was a crossover of >26,000 plant species.We ran analyses of Pagel's λ across all the climate variables for this full tree.We also performed the same analysis for individual taxonomic levels of the tree, including runs for each genus, family, order and class, when there were ≥50 species present.We excluded phyla from this analysis, however, because a single phylum encompasses essentially all species in the tree.
Given that there were not enough historical occurrence records to classify the bioclimatic envelopes of many of the species accurately, we also took various subsets of the data to compare the signals found.

| Subset tree
We fitted the λ models of Brownian motion to the 5,000 most common species ranked by number of occurrences, when we were confident that there were enough GBIF records to build a bioclimatic envelope accurately (i.e., ≥1,000 records).The data from these 5,000 species encompass c. 80% of the total GBIF data.To explore whether the signals were influenced by spatial autocorrelation between congenerics, we also calculated Pagel's λ on a subset of the data for species with occurrences on more than one continent and thus with the broadest range sizes (c.4,000 species in total).In order to see how this signal varied across the phylogeny, we included a Moran's I phylogenetic correlogram and a plot of local Moran's I index for the climate variable with the strongest λ signal (minimum temperature) using the R package phylosignal (Keck et al., 2016).
Local Moran's I index was calculated at the genus level owing to the size of the tree and the resulting computational and visualization challenges.We also examined the correlation between variables using Pearson correlation.
Finally, we used the top 5,000 most common species again to test how well we could impute missing values.We carried out a 10-fold cross-validation to reconstruct each of the climate variables using ancestralStateReconstruction from the Julia package PhyloNetworks (10% of the data at a time).This method predicts the expected values and variances of traits (the bioclimatic envelopes) for plant species with missing data by using known information for other species and estimating the evolutionary parameters of a Brownian motion model.We then explored the correlation of the resulting imputed climate profiles to the original data using Pearson correlation in order to quantify the performance of the reconstruction.We also compared the strength of the correlations against those from a tree with randomly shuffled tips.We were unable to perform this analysis on the full tree owing to computational constraints.

| RE SULTS
Starting with examination of the raw ECMWF data, which was uncorrected for any type of sampling bias, we observed strong correlations between the climate variables.Temperature-related variables and water-related variables showed positive correlations within groups (.01-.99) and negative correlations between groups (−.09 to −.86).
There was also a strong relationship between the raw climate variables calculated from ECMWF and those from WorldClim (.83-.97), approaching one in the case of mean and minimum temperatures.

| Phylogenetic signal in climate variables
The phylogenetic signal, Pagel's λ, was strong across the range of climate variables tested (Figure 3; Table 3).Randomizing the tips for this tree showed λ < 0.01, demonstrating that zero was a valid null.For the full tree, we found strong signals of >0.8 for most variables (Figure 3a).For the following results, we considered the subset tree only (Figure 3b), because the signal strength was very similar.Temperature, particularly minimum temperature and the soil temperature levels, had a signal of >0.9, and rainfall showed a similarly strong signal.Soil water volume, in contrast, performed relatively poorly, but still showed a signal of 0.7.However, this is still ordinarily considered to be a sign of a strong phylogenetic signal.The effort-adjusted data performed better than the raw data values, indicating that this effort adjustment was a valuable correction, except for a very low signal for soil water volume in the full tree.The subsequent climate correction reduced the signal at both the global and continental levels, suggesting that this correction was not functioning as intended.We excluded it from our further results, but we noted that it did improve the signals for soil water volume and precipitation when performed at the continental level (Supporting Information Figure S4; Table S2).The results for species with occurrences in more than one continent were very similar to the original analyses (Table S3).The Moran's I correlogram showed significant positive autocorrelation between minimum temperature and phylogenetic distance throughout the most recent 500 Ma (Supporting Information Figure S5).This signal weakened and became negative towards the root of the tree (>600 Ma).At the genus level, we found significant hotspots of phylogenetic signal amongst many of the genera, in addition to significant heterogeneity across the phylogeny (Supporting Information Figure S6).We also saw a great deal of variation in Pagel's λ across individual taxonomic levels of the tree (Supporting Information Figure S7), although average λ value tended to increase as we included more of the tree.
There was a strong positive correlation between the equivalent parameters extracted from WorldClim and ECMWF, hence it was unsurprising that there was a similar phylogenetic signal seen in these parameters despite the 30-year temporal averaging of the WorldClim dataset (Figure 3; Table 3).Again, temperature showed a strong signal, indicating that it is an evolutionary driver, and overall, the values for λ were slightly higher than for the corresponding ECMWF variables.Solar radiation and total precipitation showed

| Imputation of bioclimatic envelopes
Given that there were strong phylogenetic signals in the parameters tested, we investigated how well we might be able to impute bioclimatic envelope data on rarer or less-observed species in GBIF.
We ran cross-validations using data available from the 5,000 species for which the most data were available.Correlations and root mean square errors (RMSEs) between imputed and raw data were strong for most variables, particularly those that showed a phylogenetic signal in the previous analysis (Figure 4, imputed data).
For instance, minimum temperature, the variable with the strongest λ value, showed a correlation between the imputed values and real data of .79,and an RMSE of 4.69°C (for a plot of real against imputed results for minimum temperature, see Supporting Information Figure S8).We randomized the tips of the tree in order to groundtruth the data and found a correspondingly low correlation between imputed and raw values (Figure 4, randomized imputed data).

| DISCUSS ION
The extraction of climate tolerance profiles for >200,000 species is the largest so far.The few previous studies that extracted this type of information from climate datasets have been limited in taxonomic or geographical scope (Curtis & Bradley, 2016;Feeley, 2015;Harbert & Nixon, 2015;Sparrius et al., 2018).This is a natural consequence of most studies focusing on certain regions or families of plants.Although ground-truthing the data for a project of this size is extremely challenging, the strong phylogenetic signal seen in almost all traits suggests that the extracted data are indeed a comprehensive picture of global species bioclimatic envelopes, at least for the more common species.In particular, controlling for effort in plant sampling improves the phylogenetic signals in both ECMWF and WorldClim data.We have also tested the possibility that we could infer traits for missing or rarer species in the tree using phylogenetic relationships, finding a good correlation between imputed and real values.
Most phylogenetic analyses of plant functional traits or habitat preferences focus on phenological variables (e.g., Basnett et al., 2019;Davies et al., 2013) or characteristics of the environment such as soil pH or nitrogen levels (e.g., Schreeg et al., 2010).
Of the few studies that incorporate climate parameters, some report evidence of phylogenetic signal in climatic variables, such as temperature and precipitation, at the genus or family level (Steinbauer et al., 2016;Xu et al., 2019), whereas others report little or non-significant signals (Koski & Ashman, 2016;Li et al., 2017;Liu et al., 2015) at that level.Here, however, we see a very strong phylogenetic signal in almost all the climate variables considered, even sometimes in soil water volume, but at the level of the whole plant kingdom.This unusually strong result could simply reflect the lack of studies at such a geographical and taxonomic scale, reflecting in turn the lack of supertrees until recently.Indeed, running the analysis at the genus level showed that many genera have low or very low phylogenetic signal (Supporting Information Figure S7).For instance, for the large genus Solanum, we observed phylogenetic signal reduced to levels comparable to the studies cited above (Supporting Information Figure S9), which indicates that, for some genera, this scale is too restricted to detect such signals.We also see a great deal Pearson correlation between imputed and real data for each of the climate variables (for full parameter names, see Table 1). of variation in phylogenetic signal when we look at individual genera, families, orders and classes, compared with the full tree.Some phylogenetic signals can be related to spatial autocorrelation between congenerics (Freckleton & Jetz, 2009), which would be of particular concern for rare and range-limited species.In this analysis, however, we consider the 5,000 most common species in GBIF occurrence records, for which >80% have ranges that span more than one continent, and supplementary analyses restricted to only those species with the broadest ranges yielded very similar results (Table S3).
The finding that minimum produces the strongest signal is unsurprising, because numerous studies have suggested that responses to cold temperature extremes are an indicator of plant distributions and exhibit strong evolutionary conservatism (Currie et al., 2004;Qian et al., 2016;Woodward & Williams, 1987).
In particular, freezing is thought to act as a strong selective force because it is often lethal, and mechanisms to withstand freezing are difficult to evolve (Qian et al., 2016).A Brownian motion model seems to capture this phylogenetic signal well, although we might expect this trait to respond better to discrete models of freezing tolerance.We also expected that solar radiation would show a strong phylogenetic signal because light is a limiting factor in plant growth, and we observed this, although there is little evidence in the literature of this being tested before.Minimum temperature also showed We also observe strong heterogeneities in phylogenetic signal for minimum temperature, particularly at the genus and family levels (Supporting Information Figure S7).
In contrast, soil water volume produced a lower, but still strong, phylogenetic signal in most cases.There is evidence that water availability is as great a driver of species and phylogenetic diversity as temperature (Qian et al., 2016;Silvertown et al., 2015), and it is expected to act as a selective pressure on plant communities.It has been hypothesized that water-related traits in plants undergo rapid local evolution and are therefore much more labile than temperature adaptations (Arène et al., 2017).Indeed, Arène et al. (2017) have reported that the base temperature at which development takes place in plants has a strong phylogenetic signal, whereas the base water potential, a measure of water moisture, does not.Although there seems to be little explanation of this phenomenon in the literature, the strong evidence of a phylogenetic signal in average precipitation and the small positive correlation between these two variables suggests that water does play a role as a selective force in plant evolution (Brodribb et al., 2013(Brodribb et al., , 2014;;McAdam & Brodribb, 2012).The phylogenetic analyses we performed here were necessarily limited by the size of the tree (c.26,000 species) and the quality of records available (either restricted to the top 5,000 species with ≥1,000 records or including species with very few records).However, the final dataset of bioclimatic envelopes includes >200,000 species with GBIF records in the past century, and we make this available as a resource for further exploration of the effect of climate change on plant species world-wide.We have already made use of such information to parameterize dynamic models of plant biodiversity across the continent of Africa for the past century (C.L. Harris, 2019).We expect that these bioclimatic envelopes could be used to drive other types of vegetation model, including dynamic global vegetation models (Scheiter et al., 2013) and forest gap models (Shugart et al., 2018), and could be used as a comparison to the output of species distribution models, which typically use WorldClim data.Information on the climates that plants can tolerate is important for understanding how different species might respond to future change, how well mitigation strategies might work, and the interaction between climate and other threats to biodiversity, such as invasive species and habitat loss.Although these bioclimatic envelopes are only a subsection of the fundamental niche that these plant species could occupy, we expect that their global scope and long temporal scales mean that their realized niches approach the fundamental in many cases (Araújo & Peterson, 2012).
GBIF plant occurrence records span hundreds of years and at least half of the estimated 400,000 species world-wide.They also suffer from numerous taxonomic, geographical and temporal biases (Meyer et al., 2016).Although there are now automatic algorithms to correct for obvious simple errors in georeferencing (CODATA, 2020), such as swapped coordinate signs, further corrections are needed to account for more complex biases, such as global collection effort.
We made several corrections to the data used in the present study, including the use of proxies for plant density to adjust the weight of different records, which resulted in a corresponding increase in phylogenetic signal.Given that there is no reason why phylogenetic signal should be boosted randomly by such corrections, as is evidenced when we randomize the tips, this approach is a candidate for selecting other, future analyses.There are limitations to this approach, especially for species with few or no georeferenced coordinates, for which data imputation will be necessary.We also explored correction for biases in both world-wide and continent-level climates, but our simple approaches failed to improve the phylogenetic signal.In light of this, we suggest that any climate correction should be applied Given the strong correlation between the ECWMF data and WorldClim, it is unsurprising that there is a similarly strong phylogenetic signal in temperature and precipitation in the WorldClim data.
However, given that ECMWF has a much broader range of climate parameters, it remains the obvious choice for this type of analysis.
The ideal next step for such research would be to incorporate all data points for all species, rather than using averages of climate bins for subsets of species in the dataset.Using phylogenetic mixed models for this would both facilitate the extraction of climate profiles directly from estimated distributions and account better for more of the heterogeneity in the data (de Villemereuil & Nakagawa, 2014).
As yet, these analyses are far too computationally intensive for the c. 26,000 species in the dataset, never mind for the 200,000 species that could be added to a partially resolved supertree.Further investigation of the feasibility of data imputation is also necessary for using these phylogenetic signals to impute data for missing or rare species, although we find that there is no systematic under-or over-estimation of minimum temperature as tested in the present study (Supporting Information Figure S8).

| CON CLUS ION
Global plant species will face many coming threats this century, the most devastating of which is likely to be climate change.
Understanding the relationship between plants and their climate is fundamental both to prediction of their future ranges and to the implementation of effective conservation strategies.Here, we have demonstrated that information on plant bioclimatic envelopes could be extracted reliably using historical records and climate reconstructions, which could then be used in further analyses, and we provide these variables for all 200,000 species as a public resource alongside this publication.We found a very strong phylogenetic signal in many climatic parameters, including temperature, soil temperature, solar radiation and precipitation.This evolutionary signal was improved by the implementation of a correction for the bias in collection effort world-wide and can also be used to impute data for related missing species.Future analyses could explore further the evidence for niche conservatism for climatic parameters at the supertree level.

a
Climate normals, which are averaged over the entire time period for each month.b Single average over the entire dataset.TA B L E 2 Details of the climate and vegetation datasets used, including the range of years used and the spatial resolution F I G U R E 1 Natural log recording effort for historical plant observations in the Global Biodiversity Information Facility (GBIF) since 1901 after controlling for plant density using the enhanced vegetation index (EVI).White indicates areas excluded from this analysis, namely islands from the Arctic circle and the Antarctic land mass.Grey lines above and to the right of the graph indicate summed values of recording effort at each longitude and latitude, respectively.The map is plotted in the Mollweide equal area projection.
in the climate variables, we used Pagel's λ, which has been shown to be more robust against incomplete phylogenies and inaccurate branch-length information than Blomberg's K (Molina-Venegas & Rodríguez, 2017).Values of Pagel's λ closer to zero indicate that traits show little phylogenetic signal, whereas values approaching F I G U R E 2 Evolutionary tree of the 5,000 plant species with the most records in the Global Biodiversity Information Facility (GBIF), along with their reconstructed climate profiles: (a) minimum temperature (in degrees Celsius); (b) mean monthly rainfall (in millimetres); and (c) mean monthly solar radiation (in kilowatts).
evolved along the phylogeny with a Brownian motion model, or random genetic drift(Harmon, 2019).We also simulated signal-free data by randomly permuting the tree tips to estimate what value of Pagel's λ would be inconsistent with the absence of a phylogenetic signal.In addition to simply detecting the phylogenetic signal, we used Pagel's λ as a barometer for the success of both the parameter extraction and the adjustments described above, because random (changes to the) data should produce weak(er) responses in λ.High values of λ (or increases in λ) were viewed as an indication that a parameter was captured correctly (or a correction had been effective).We also applied the λ models to the continentally adjusted datasets.In every case, we randomized the tips to confirm that λ ≈ 0 when no phylogenetic signal was present.For the fitting, we used the PhyloNetworks package v.0.11.0(Solś-Lemus et al., 2017) in Julia v.1.5.2(Bezanson et al., 2017).The code for these analyses is available under an open source license on GitHub (see data availability statement).

F
I G U R E 3 Phylogenetic signal, Pagel's λ, for each of the climate variables and levels of correction: Raw climate variables (Raw) and adjusted by effort (Effort).The colour scheme in the key is scaled to be centred on the mean λ value.The analyses used: (a) the full 26,466 species tree; and (b) a subset of the top 5,000 most common species.
the strongest correlation with phylogenetic distance closest to the tips, although it remained significantly positive throughout the tree until c. 500 Ma.Using local Moran's I, we see that there is heterogeneity in autocorrelation for minimum temperature across genera, but there are still large clades showing strong signals across the tree, and from the autocorrelogram we expect that this index would show even stronger correlation across the plant kingdom at species level.
Therefore, the reduced signal in soil water volume could instead indicate that the scale at which we extracted the data (80 km for ERA-Interim) is simply too coarse, particularly in areas such as Africa with high effort correction, because this correction weakened the signal uniquely for soil water volume in Table3.However, perhaps more pertinently, ECMWF raise questions about the quality of soil moisture data in ERA-Interim [Copernicus Climate Change Service (C3S), 2021], stating that soil moisture values are only intended to "provide a qualitative picture of major anomalies", with plans to improve their estimates in ERA-5.Failure to identify strong and/ or consistent phylogenetic signals in soil moisture variables might therefore simply reflect poor data quality.
14668238, 2022, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/geb.13564by University Of Glasgow, Wiley Online Library on [11/11/2022].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License to species individually to account for their specific available climate.This would require detailed information on dispersal and other barriers to availability of climate, such as geographical boundaries, and as such, is beyond the scope of the present study.However, when trying to account for spatial autocorrelation between species in the dataset, we did find similarly strong phylogenetic signals when we filtered for the most widespread species (those found on more than one continent).Information on processes such as dispersal and the physiological limits of species would help further to disentangle the factors contributing to the strong phylogenetic signal we see here and potential for PNC amongst global plant species.
Details of the climate variables downloaded from each dataset, where ✓ and ✕ denote availability TA B L E 1