To examine the molecular population genetics of the M protein family of Streptococcus pyogenes (group A Streptococcus), the 5' regions of polymerase chain reaction-amplified emm products from 79 M serotypes were sequenced and the phylogeny was compared to estimates of overall genetic relationships among strains determined by multilocus enzyme electrophoresis. Although the 5' emm sequences from several strains designated as distinct M types were identical or almost identical, the overall pattern is characterized by very extensive variation. The composition of distinct emm sequence clusters generally parallels the ability of strains to express serum opacity factor and in some cases historical associations of certain M types with acute rheumatic fever, but not with M types classified as nephritogenic. For many strains there is a lack of congruency between variation in 5' emm sequences and estimates of overall chromosomal relationships, which is undoubtedly due to horizontal transfer and recombination of emm sequences. The results of these studies provide insights into the nature and extent of emm sequence variation and describe how this variation 'maps' onto the population genetic structure of extant S. pyogenes lineages. The complexity of emm sequence and streptococcal cell lineage relationships revealed by this analysis has significant implications for understanding evolutionary events generating strain diversity and the epidemiology of S. pyogenes diseases.