Color a sequence alignment (nucleotide or protein) to visualize haplotype/recombination blocks
Given an initial alignment of variable sites (e.g., example.aln), haploColor.R will process it to facilitate visualization of 'haplotype' structures and recombination blocks.
- Assign first sequence as reference.
- Paint all residues of reference a unique color C.
- Where other sequences match the reference, paint them color C
- Identify sequence most dissimilar to the reference, and assign it as the new reference
- Repeat steps 2-3 until all sequences are completely colored.
This is a greedy algorithm that still has some issues.
- For each sequence:
- For its most common to least common colors
- Compute: density = # occurrences from min to max position / (max-min)
- If density > threshold
- Assign that color to a block from its min to max position
- For its most common to least common colors