This repository contains the datasets and scripts used in the following paper:
- Y. Tabatabaee, S. Claramunt, S. Mirarab (2024). Coalescent-based branch length estimation improves dating of species trees. https://www.biorxiv.org/content/10.1101/2025.02.25.640207v1.abstract
For experiments in this study, we generated three sets of simulated 6CFF datasets with gene tree discordance due to incomplete lineage sorting (ILS) and analyzed two avian biological datasets from Harvey et. al. (2020) and Stiller et. al. (2024). The simulated datasets have model species trees with substitution-unit, generation-unit, and time-unit branch lengths. All datasets are available here.
30-taxon dataset
- Original dataset is from Mai at al. (2017) and available at https://uym2.github.io/MinVar-Rooting/.
- Results and intermediate data from the experiments in the paper are available here.
101-taxon dataset
- Original dataset is from Zhang et. al. (2018) and available at https://gitlab.com/esayyari/ASTRALIII/.
- Results and intermediate data from the experiments in the paper are available here.
Large dataset
- This dataset has 8 model conditions with 50, 100, 200, 500, 1K, 2K, 5K, and 10K-taxon trees.
- Raw dataset and results and intermediate data from the experiments in the paper are available here.
- Neoavian: 363-taxon neoavian dataset from Stiller et al. (2024) with 63,430 single-copy genes. The original data is available here. Results from the analysis in this study is available at /biological/avian-stiller.
- Suboscines: 1683-taxon suboscines dataset from Harvey et. al. (2020) with 2,389 single-copy genes. The original data is available at https://github.com/mgharvey/tyranni. Results from the analysis in this study is available at /biological/suboscines-harvey.