Methods and analysis of genotyping + introduction to population genetics

Topics

This week’s assignments will guide you through the following topics:

  • Why genotyping data is heavily protected due to reidentification issues.
  • How next gen sequencing is used to genotype samples for genetic analysis.
  • Implications of rare (< 1 percent), low frequency (1 - 5 percent), and common alleles (> 5 percent) on disease.

Reading

Please read the following:

Tasks

TO DO: Download plink from HERE Choose the correct Development (5 Oct) version of plink for your machine. You will need to unzip the downloaded file for the plink2 executable to be usable. You can test it by running ./plink2 in the directory where you downloaded it.

Using the 1000 Genomes genotype data downloaded from the LDREF link:

  • What proportion of alleles are common (MAF > 5%); how many alleles are of low frequency (1% < MAF < 5%) or rare (MAF < 1%)? Note, you could either use plink2 to compute allele frequencies using bed/bim/fam files. Or you could check out the .frq (frequency) files. Feel free to only do this on one chromosome if you’re in a crunch for time!
  • How many people in the plink genotype files have gene expression measured in the expression dataset?

Weekly Questions

Answer the following questions on Gradescope:

  • What is the tradeoff between rare and common alleles in studying the genetics of human disease?
  • What are the risks associated with the distribution of personally identifiable genetic data?