From c808e431cc59ebd5eef4741bd75a8f6ab6c7d160 Mon Sep 17 00:00:00 2001 From: ggserver Date: Tue, 11 Feb 2025 12:16:33 +0000 Subject: [PATCH] Add updates up until 2025-02-11 12:10:29 --- .../AC/03E2AC42FF90FFF7FE1DFB6ADD2FF8D9.xml | 755 ++++++++++++++++++ 1 file changed, 755 insertions(+) create mode 100644 data/03/E2/AC/03E2AC42FF90FFF7FE1DFB6ADD2FF8D9.xml diff --git a/data/03/E2/AC/03E2AC42FF90FFF7FE1DFB6ADD2FF8D9.xml b/data/03/E2/AC/03E2AC42FF90FFF7FE1DFB6ADD2FF8D9.xml new file mode 100644 index 00000000000..8b8cc135e9a --- /dev/null +++ b/data/03/E2/AC/03E2AC42FF90FFF7FE1DFB6ADD2FF8D9.xml @@ -0,0 +1,755 @@ + + + +Contrasting the depths of divergence between gene-tree and coalescent estimates in the North American racers (Colubridae: Coluber constrictor) + + + +Author + +Myers, Edward A. +Department of Herpetology, California Academy of Sciences, 55 Music Concourse Drive, San Francisco, California 94118, USA & Department of Herpetology, The American Museum of Natural History, Central Park West and 79 Street, New York, NY 10024, USA +eddie.a.myers@gmail.com + + + +Author + +Gehara, Marcelo +Sackler Institute for Comparative Genomics, American Museum of Natural History, NY 10024, USA & Department of Biological Sciences, Rutgers University-Newark, 195 University Ave, Newark, NJ 07102, USA + + + +Author + +Burgoon, Jamie L. +Department of Herpetology, The American Museum of Natural History, Central Park West and 79 Street, New York, NY 10024, USA + + + +Author + +McKelvy, Alexander D. +Department of Biology, The Graduate School and Center, City University of New York, New York, New York 10016, USA & Snake Evolution and Biogeography, Wacissa, FL 32361, USA + + + +Author + +Vonnahme, Lauren +Department of Herpetology, The American Museum of Natural History, Central Park West and 79 Street, New York, NY 10024, USA + + + +Author + +Burbrink, Frank T. +Department of Herpetology, The American Museum of Natural History, Central Park West and 79 Street, New York, NY 10024, USA + +text + + +Zoological Journal of the Linnean Society + + +2025 + +2024-03-20 + + +203 + + +1 + + +1 +17 + + + + +https://doi.org/10.1093/zoolinnean/zlae018 + +journal article +10.1093/zoolinnean/zlae018 +0024-4082 + + + + + + +RESULTS + + + + + + +Sequence alignments + + +The final assembled dataset consisted of 3417 loci (3348 UCE loci, 61 AHE loci, and eight of the ‘legacy Sanger’ loci). The mean proportion of loci recovered within each sample was 94% (range: 65–97%). The mean length of these assembled loci after trimming was 230.2 base pairs (range: 2–1731) and the mean number of segregating sites was 3.9 (range: 0–30). All raw sequence data have been accessioned on the NCBI Sequence Read Archive (BioProject ID PRJNA1082780; Supporting Information, +Table S1 +). The mtDNA sequence assembled from these raw reads ranged in length from 124 to 7083 bp in length and were trimmed to the 1117 bp length of +Cytb +. + + +Population structure, phylogenetic relationships, and gene divergence times + + +Results from sNMF analyses indicate that +K += 5 is the best fit ( +Fig. 3 +). These five populations correspond to a population in peninsular +Florida +, an eastern population in the temperate deciduous forests, a central population within the +North American +grasslands, a western population isolated across the +Rocky Mountains +, and a population from south +Texas +. Hereafter these populations will be referred to as the +Florida +, eastern, central, western, and south +Texas +populations, respectively. The PCA plot was visualized with the +K += 5 population structure inferred from sNMF ( +Fig. 3 +). This plot shows clear separation of all five lineages, with only some overlap in PC space between the central and western populations. The network inferred from SplitsTree largely identifies these clusters, albeit with some degree of nestedness ( +Fig. 3D +). For example, the western population is nested within the central group, while the eastern population is divided into two clusters. + + +The SNAPP-based species tree recovered a pectinate phylogeny with divergences as follows: south +Texas +, peninsular +Florida +, eastern, and a sister-relationship between the western and central populations ( +Fig. 4 +). All relationships within this tree were strongly supported with posterior probabilities ≥0.99. All ESS values from this analysis were>200 as assessed in TRACER v.1.7.1 ( + +Rambaut +et al. +2018 + +). The concatenated phylogenomic tree from IQ-Tree, while finding each of these populations as a cluster, differs in their relationships. Here the monophyletic western population is nested within the central lineage, the eastern and +Florida +populations are sister-taxa, with the south +Texas +population being the earliest diverging group; again, these relationships are well supported (bootstrap ≥ 90). Within the concatenated tree there are two samples that have ‘ladder-like’ relationships between the eastern and +Florida +lineages, and both samples have high admixture proportions from sNMF (LSU-H2967 and FTB2688; both assigned to the eastern population). The mtDNA gene tree recovered the same phylogenetic relationships as the coalescent-based species tree, with moderate to high support ( +Fig. 4 +). One sample (LSU H-2967) clustered with the eastern lineage samples using genomic data but mtDNA originated from the eastern lineage. + + +Divergence times were estimated based on both a single locus mtDNA dataset and the concatenated genomic data ( +Fig. 4 +). The mtDNA analysis suggests that + +Coluber + +and + +Masticophis + +diverged 10.8 Mya (95% HPD: 8.9–12.7 Mya). In this gene tree, the earliest divergence within + +Coluber + +occurred 8.2 Mya (6.3– 10.2 Mya), corresponding to the south +Texas +lineage divergence, demonstrating that these lineages began diversifying during the Mid to Late Miocene. The concatenated genomic analysis also suggests that + +Coluber + +began diversifying in the Mid to Late Miocene with an estimated crown age of 10.7 Mya (8.7–12.8 Mya; +Fig. 4 +). The youngest divergence times between these phylogeographic lineages occurred 6.2 Mya (3.3–8.9 Mya) between the western and central populations. + + +Demographic model selection and parameter estimation + +The trained neural network classified the six models with relatively low accuracy (0.82) due to confusion among the ancient divergence models. This unidentifiability among the ancient divergence models is more evident in the classification using the mtDNA alone where the accuracy was 0.44. When comparing one of the five ancient divergence models with the recent divergence model (Is), the accuracy was 1.0 for the genomic data and 0.88 for the mtDNA data. + +The genomic data strongly supports the model with recent divergence without migration (Is), with a probability of ~1.0. Parameter estimates from this model suggest that lineage divergence of all populations occurred ~33 kya (25–43 kya 95% confidence interval) with no gene flow. These estimates also suggest small effective population sizes for the western and central populations and larger population sizes for the eastern and +Florida +populations, with a much stronger population size increase in the latter two. The timing of population size change also shows a more recent demographic change for the western population. These parameter estimates are available in the Supporting Information (Table S4; +Fig. S2 +). By contrast, the mtDNA data support one of the ancient IMD models, with the model where the western population is sister to the central population and the eastern population is sister to the +Florida +population, having the highest probability ( +P += 0.43). + + + +Figure 3. +A, population structure inferred from the genomic sequence capture data using sNMF. Different colours within the pie charts correspond to admixture coefficients between each of the five lineages; Florida, eastern, central, south Texas, and western. Light grey highlighting represents the geographic distribution of + +Coluber constrictor + +. B, results of admixture coefficients from sNMF represented as a bar plot where each bar is a sampled individual. C, PCA of the genomic data demonstrating the differentiation of these phylogeographic lineages. D, SplitsTree network showing population structure and nestedness of these populations. + + + +Ecological niche models and divergence + + +After thinning locality records, we retained +28 specimen +localities for the south +Texas +population, 116 for the +Florida +population, 138 for the western, 431 for the central, and 463 for the eastern (localities and associated natural history specimen numbers are accessioned on Dryad). +All +comparisons of niche equivalency were significant but suggest varying levels of divergence in ecological niche between lineages. +The +greatest differences were between the +Florida +and the eastern lineages ( +D +statistic = 0.02, +P +-value = 0.01) and between the central and south +Texas +lineage ( +D +statistic = 0.03, +P +-value = 0.01). +This +test suggests that both the eastern and central lineages ( +D +statistic = 0.14, +P +-value = 0.01), and the central and western lineages ( +D +statistic = 0.27, +P +-value = 0.01), occupy more similar environments. + + +Each lineage had a unique, best-fit combination of feature class and regularization multipliers (Supporting Information, Table S5) that were used to construct ENMs in MaxEnt. All ENMs performed well with high area under the receiver operating characteristic curve values (AUC = 0.89 for south +Texas +; 0.99 for +Florida +population; 0.96 for eastern; 0.93 for central; 0.96 for western). All ENMs predict the geographic distribution of the focal lineage with little overlap in predicted geographic distribution between geographically adjacent populations ( +Fig. 5 +). ENMs projected on to the Mid-Pleistocene suggests that all five lineages would have had reduced geographic distributions into refugia that were allopatric from one another ( +Fig. 5 +). + + + +Figure 4. +Gene trees and species tree based on sequence capture and by-catch mtDNA data. A, maximum likelihood concatenated tree inferred from phased sequence capture data. B, species’ tree based on SNPs from assembled genomic data inferred using SNAPP. C, divergence dated tree from BEAST based on concatenated genomic sequence data. D, maximum likelihood mtDNA phylogeny for nearly all samples for which sequence capture data was generated. E, dated mtDNA gene tree inferred in BEAST including representative species from the sistergenus + +Masticophis + +. In (A) and (D) black circles at nodes represent bootstrap support values ≥95 and in (C), (D), and (E) represent posterior probabilities ≥0.9. + + + +Morphological divergence + + +Sexual dimorphism was found in tail length, head width and length, eye diameter, and rostral length ( +P +-values <0.05). Principal component analyses are significant ( +P +-value = 0 in both cases); however, there is no clear pattern of separation between lineages ( +Fig. 6 +). PC1 accounts for 39.1% of variation and PC2 for 19.7% variation in males, with the highest loadings contributed to head width, length, and eye diameter in PC1 and SVL in PC2. Within females, PC1 accounts for 41.0% of the total variation and PC2 for 16.7%, with the highest loadings contributed to head width, head length, rostral length, prefrontal width, and internasal width on PC1. Using PERMANOVA tests, males of the eastern and +Florida +lineages are significantly different from the other three lineages but not distinct from one another in morphology; similarly, the central, western, and south +Texas +lineages are indistinguishable ( +Table 1 +). PERMANOVA tests demonstrate that with morphological data from females, the eastern lineage is distinct from the western and south +Texas +lineages, and the +Florida +lineage is distinct from the south +Texas +clade, whereas all other comparisons were not significant ( +Table 1 +). Discriminant function analysis with cross-validation demonstrates that these lineages can be differentiated from one another with an accuracy of 56.2% in males and 47.5% in females. In all attempts to classify specimens, there are misclassifications between all lineages using both morphological datasets. + + + + + +DISCUSSION + + + +Previous studies have suggested morphological variation and extensive population genetic structure across the distribution of the North American racers (e.g. +Ortenburger 1928 +, Anderson 1996, + +Burbrink +et al. +2008 + +). Here we corroborate much of this lineage divergence using a combination of genomic sequence capture data, mtDNA, morphology, and ecological niche models. We demonstrate that this structure corresponds to several well-known biogeographic barriers across North America. The genome-scale data and mtDNA show similar geographic lineages and largely agree in the phylogenetic relationships among these lineages. However, demographic models based on the genomic data strongly disagree with the best-fit model to the mtDNA in divergence times and topology. Furthermore, comparing divergence times estimated from mtDNA and concatenated genomic data to the divergence times estimated from demographic models suggests that these discrepancies are the result of differences between gene- vs. coalescent-divergence time estimates. Additionally, we find that several of these lineages are distinguishable morphologically, largely across the +Mississippi +River, and several adjacent population pairs have diverged in environmental niche. Lastly, hindcast ENMs suggest that all lineages were distributed allopatrically in the Mid-Pleistocene. + + + +Figure 5. +Ecological niche models constructed for each of the five phylogeographic lineages based on current climate and paleo-climate models from ~21 kya. All panels are labelled with their corresponding lineage and if they correspond to late Pleistocene projections. Regions shaded in green are those that were inferred to have higher environmental suitability for each lineage based on locality data used in the ENMs. + + + +Demographic model selection + + +Model selection approaches were ambiguous to the evolutionary and demographic history of the North American racers. First, the best-fit model, that of recent divergence and no gene flow, could not determine the phylogenetic relationships among these lineages and instead suggested that a polytomy best represented the divergence history. Second, the timing of divergence and amount of gene flow is unclear; the model of recent divergence (~33 kya) obtained the highest support ( +P += 1.0), contradicting results based on mtDNA and concatenation, where divergence was estimated to be several orders of magnitude older. This conundrum is unsatisfying given that the goal of phylogeography is to infer historical relationships and estimate population genetic parameters ( + +Hickerson +et al. +2010 + +), where the combination of model-based approaches with high throughput sequencing technologies have promised to do this with high accuracy and precision ( + +McCormack +et al. +2012 + +). + + +Much of this uncertainty could stem from several historical processes. For example, the rapid diversification of these lineages may have resulted in difficulty discerning relationships and, therefore, a soft polytomy may be the best representation of historical relationships of these lineages. However, this may be unlikely given the inferred relationships based on multispecies coalescent, concatenated, and mtDNA tree-based analyses ( +Fig. 3 +). It is well known that rapid and successive lineage divergence can result in high gene tree heterogeneity and this has been shown to be widespread in many taxa ( + +Linkem +et al. +2016 + +, + +Pardo-De la Hoz +et al. +2023 + +). The topological differences between our inferred species’ tree and the concatenated tree suggest that there is extensive heterogeneity among gene trees within + +C. constrictor + +. These differences between gene-tree and coalescent-based analyses are also observed in comparing estimates of divergence times. For example, the timing of diversification largely agreed between the single locus mtDNA analysis and the concatenated genomic analysis. Both of these gene-based approaches suggest that + +Coluber + +began diversifying during the Late Miocene, which contrasts the coalescent-based estimate suggesting the Late Pleistocene. Whether summary statistic-based methods, like the neural network approach that we implemented here, are influenced by extreme heterogeneity among gene trees is unknown. Future simulation studies are needed to examine this issue (but see: +Fan and Kubatko 2011 +, +Alanzi and Degnan 2017 +). + + +This extreme discordance seen between the mtDNA and sequence capture data in reconstructing the demographic history of this taxon could also be the result of sex-biased dispersal ( +Toews and Brelsford 2012 +). In this case it is possible that females are highly philopatric, with high dispersal in males. This could result in a situation where ancient mtDNA lineages are maintained with much shallower divergence throughout the nuclear genome. Mark–recapture ( +Glaudas and Rodriguez-Robles 2011 +), as well as population genetic and phylogeographic analyses ( + +Dubey +et al. +2008 + +, + +Pernetta +et al. +2011 + +), have suggested that male-biased dispersal is common within snakes. While this could explain the observed pattern here, it is expected that we would find a signal of gene flow in the nuclear genome, which is not the case in our demographic modelling. Additionally, when estimating divergence times based on the concatenated genomic data, we do not find the expected shallow divergence times. + + + +Figure 6. +Morphological differentiation between phylogeographic lineages in + +Coluber constrictor + +. A, B, PCA of morphological variables; A, males; B, females. C, D, are boxplots of each linear measurement showing the differentiation of each of these traits between the five lineages: C, represents males; D, females. In panels (C) and (D), pairwise comparisons significant at an adjusted +P +<.05 with a Tukey’s honest significant difference test are indicated by letters, where shared letters indicate that pair-wise comparisons are significantly different. + + + + +Table 1. +Morphological differences from PERMANOVA. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+F-statistic + +R +2 + +P -value +
+Male +
East—Florida0.620.009NS
East—Central4.120.140.01
East—Western6.930.220.01
East—S. Texas5.370.200.01
Florida—Central3.190.060.05
Florida—Western5.560.10.01
Florida—S. Texas4.160.080.05
Central—Western0.670.06NS
Central—S. Texas1.420.15NS
Western—S. Texas0.780.08NS
+Female +
East—Florida1.040.06NS
East—Central1.310.10NS
East—Western4.560.280.01
East—s. Texas5.170.270.01
Florida—Central0.620.06NS
Florida—Western1.880.16NS
Florida—S. Texas3.200.210.05
Central—Western1.100.16NS
Central—S. Texas1.650.22NS
Western—S. Texas0.140.15NS
+
+ +It is also possible that the history of this group is too complex for the most commonly used isolation-migration models, where repeated bouts of isolation and contact are the cause of model selection uncertainty. Complex demographic histories of rapid divergence coupled with gene flow during glacial and interglacial periods is common in many temperate zone taxa ( + +DeRaad +et al. +2022 + +, +Harrington and Burbrink 2023 +). Similar histories of divergence have been suggested to be widespread and have been called the ‘mixing-isolation-mixing’ model of divergence ( + +He +et al. +2019 + +). If this is the true underlying history of the group, then the expectation might be that model selection approaches would default to a history of ancient divergence with high gene-flow. However, previous studies have demonstrated that discriminating between isolation-only and secondary-contact models is difficult. For example, data simulated under a secondary-contact model could only be confidently assigned back to this true model when the period of isolation was sufficiently long. When shorter periods of isolation were simulated, ABC approaches were inconclusive ( + +Roux +et al. +2016 + +). + + +How then can we choose among competing models of historical demography? Our demographic analyses suggest that the phylogeographic relationships are best represented as a polytomy with very recent divergence and no migration. However, this disagrees with the best-fit model based on mtDNA, estimated gene-divergence times, and with previous phylogeographic estimates ( + +Burbrink +et al. +2008 + +). Here, we have a strong signal of divergence in the mtDNA phylogeographic analyses that matches both the population structure and SNAPP-based species’ tree topology reconstructed from the genomic data, all dating the origin of diversification to the Miocene. We note, however, that both the mtDNA and concatenated genomic divergence times are gene-divergence, not lineage-divergence times ( +Edwards and Beerli 2000 +) and we do not take them to be reflective of species divergence times. A caveat of overly relying on gene-tree based (e.g. mtDNA alone) inferences is that increased levels of genetic diversity may be explained by the persistence of large N +e +at mutation-drift equilibrium ( +Charlesworth 2009 +, + +Morgan-Richards +et al. +2017 + +). A pattern of high genetic polymorphism within sampling sites has been demonstrated to be the result of elevated levels of N +e +maintained through time ( + +Morgan-Richards +et al. +2017 + +). However, this seems less likely here given that the deep mtDNA lineages of racers are geographically structured and match population genetic structure observed with the genomic data. + + +In cases where demographic model selection and divergencetime estimates between datasets strongly disagree, there are two additional methods that may prove useful in future analyses. First, whole genome resequencing data could provide more accurate and precise model selection and demographic parameter estimates over commonly used sequence capture loci like UCEs. It is now possible to generate low-coverage, whole-genome resequencing data (~2× coverage) for similar costs compared to reduced representation sequencing methods, and by spreading sequencing effort across more samples, albeit at a lower per base pair coverage, while increasing the accuracy of many population genetic inferences ( + +Lou +et al. +2021 + +, +Reid and Pinsky 2022 +). Second, ancient DNA sampled from Pleistocene subfossils will improve our understanding of the demographic histories and evolutionary relationships of many taxa ( + +Orlando +et al. +2009 + +, + +van der Valk +et al. +2021 + +). These approaches have also recently been used to better understand both taxonomy and biogeographic history in squamate reptiles and, in some cases, DNA has been obtained with minimally destructive sampling methods ( + +Torres-Roig +et al. +2021 + +, + +Scarsbrook +et al. +2022 + +). Throughout North America, Pleistocene snake subfossils are remarkably common ( +Brattstrom 1967 +, +Holman 2000 +) and future efforts to extract and sequence DNA from these fossils may further improve our understanding of species’ diversification and demographic histories. + + + + +Coluber constrictor + +phylogeography + + + +The observed population genetic structure within + +Coluber constrictor + +corroborate previously identified phylogeographic breaks within this taxon to known barriers across North America, and, in some cases, subspecific taxonomy. This structure that we find is associated with the Florida peninsula, across the transition between the forested region of eastern North America and the central grasslands, across the Rocky Mountains, and a south Texas group that may extend as far south as southern +Mexico +and +Guatemala +. + + +Divergence between the +Florida +peninsula and continental North America, as well as across the transition between the forested region of eastern North America and the central grasslands, is commonly found in many taxa ( + +Soltis +et al. +2006 + +, + +Burbrink +et al. +2022 + +). Although both regions have been shown to be important for promoting population genetic divergence across entire communities of organisms, the timing and causes of divergence are not well known. For example, across the Florida-continental discontinuity several non-mutually exclusive hypotheses have been proposed, including Pliocene sea-level rises resulting in +Florida +being mostly submerged, consisting of only a series of islands in the centre of the peninsula ( +Webb 1990 +, + +Clark +et al. +1999 + +), Pleistocene refugia ( + +Waltari +et al. +2007 + +), which is also supported in our hindcast ENMs, or, as we have demonstrated here, the potential for divergent ecological selection on environmental niche. Similarly, population genetic structure across the middle Nearctic has contributed to both the ecotonal transition from deciduous, broadleaf forests into grasslands, as well as vicariance across the +Mississippi +River ( + +Burbrink +et al. +2000 + +, +Leaché and Reeder 2002 +, +Pyron and Burbrink 2010 +, +Satler and Carstens 2017 +, + +Myers +et al. +2020 + +). Future comparative studies should focus on the interaction of vicariance and adaptation in shaping shared patterns of biodiversity at these two regions. + + +Two lineages identified here also correspond to previously named taxonomic units, + +C. c. +mormon + +and +C. c. oaxaca +. Both of these taxa were originally described as species ( +Baird and Girard 1852 +, + +Jan 1863 + +) and there has been disagreement in the literature regarding the specific status of these taxa ( + +Fitch +et al. +1981 + +, +Greene 1984 +). The geographic distribution of the North American racers is allopatric across the Rocky Mountains and, therefore, it is not surprising to find population genetic structure across this region (but see: +Corn and Bury 1986 +). The two samples from southern Texas that form a distinct population fall within the geographic range of +C +. +c. oaxaca +; however, the full distribution of this subspecies is discontinuous from southern Texas along the east coast of +Mexico +to +Guatemala +and +Belize +. Whether the entirety of this distribution belongs to this lineage is unknown. Furthermore, with our sampling of only two individuals, we were unable to include this lineage in our demographic analyses, so whether there is gene flow between this lineage and adjacent populations is also unknown. + + +Model selection with genomic data supported population size expansion in all lineages. These demographic size changes probably reflect population expansion associated with the end of the last glacial maxima in the Late Pleistocene, a common finding in phylogeographic studies of taxa in eastern North America ( +Hewitt 2000 +), and is in agreement with previous single-locus phylogeographic analyses of this taxon ( + +Burbrink +et al. +2008 + +). The inferred population expansions also agree with the ENMs projected to the LGM where all lineages have large reductions in suitable habitat. + +
+
+
\ No newline at end of file