files used for twisst analysis

to extract subsequences from fasta files, the script fasta2subseq.py was used with the pop.txt file:

https://raw.githubusercontent.com/kullrich/bio-scripts/master/fasta/fasta2subseq.py

to calculate literal-diststances along the genome in 25kbp windows, first bedtools makewindows was used to create 25kbp windows

bedtools: v2.31.1

bedtools makewindows -g GRCm39.primary_assembly.genome.fa.sizes.genome -w 25000 > w25kbp.bed

in a second step, the script fasta2subseq.py was used with the corresponding 25kbp windows to extract the subsequences and pipe into the software tool literal-dist to get pairwise distances

https://github.com/kullrich/literal-dists

the global gap threshold was set to 0.1 to discard any site from a 25kbp window if the gap frequency is equal or higher than 10 percent, the gap length was reported

an example for a window would look like this

python fasta2subseq.py -i pop.txt -chr 1 -start 0 -end 25000 > tmp.fasta; literal-dists -b -i -g -z 0.1 tmp.fasta > tmp.dist

the resulting pairwise distances were used with the R package ape and the function ape::bionjs to reconstruct a tree from incomplete distances with the bio-NJ algorithm

all windows with too sparse pairwise distances to reconstruct a tree were skipped

see also

do_trees.R
dist2tree.Rscript

subsequently twisst was used with different population combinations to calculate weighted topologies and visualized

https://github.com/simonhmartin/twisst

an example twisst command for the population (((MUS, DOM), SPRE), SPIC) would look like this

twisst.py -t chr1.trees.gz -w chr1_DOM_MUS_SPRE_SPIC.scv.gz -o topologies.chr1.DOM_MUS_SPRE_SPIC.trees -g DOM -g MUS -g SPRE -g SPIC --outgroup SPIC --method complete --groupsFiles groups_DOM_MUS_SPRE_SPIC.tsv