How Genome Wide Association Studies (GWAS) Work

By Published On: April 2, 2020Last Updated: September 25, 2022

The rapid rise of technology in the last few decades has led to huge advancements in computational science, which in turn has benefit fields like bioinformatics greatly. Genome Wide Association Studies (GWAS) are a new and exciting area of bioinformatics, combining computing power with our rapidly growing knowledge of DNA to help us better understand our genetic code. In this article, we study the human genome as well as how GWAS works in identifying disease.

Cracking The Genetic Code

Bioinformatics combines biology, computer science, information engineering, mathematics and statistics to explain data. As new technologies today generate enormous amounts of data, we require methods to sieve through them. An important and growing field within bioinformatics, genome wide association studies (GWAS) work by searching for links between genetic data and disease. GWAS allow us to crack the genetic code, bringing us closer to treating and curing genetic-related diseases in the future.

How Does DNA Work?

To understand the data that is stored in our genetic code, we first need an introduction to how genes work. Genes are the instruction for life, telling the body how to produce proteins necessary for our survival. These genes are stored in the hard-to-pronounce deoxyribonucleic acid (DNA). DNA is the basic, hereditary material found in humans and other organisms which allows for the easy transfer of genes between parent and offspring.

The information in DNA is stored using a code made up of four chemical bases: adenine, guanine, cytosine and thymine. For ease of use, we refer to them by their first letter: A, G, C and T. The bases come with two structures attached to them, a sugar molecule and a phosphate molecule, as can be seen in the image below. The combination of base plus sugar and phosphate is what we call a nucleotide.

This diagram shows the chemical structure of each base pair. The colors blue, black, red and green indicate the different parts that identify each base. The P-O cross shapes represent the phosphate molecules and the pentagons with the “O” represent the sugar molecules. (Genome.gov, 2020)

Our Genes Make Us Unique

Our DNA consists of ~3 billion of these nucleotides and the way they are ordered determines the information that allows us to build and maintain an organism. These bases show preferences in what we call pairing: A prefers to pair with T and C with G. These base pairs form sequences, telling the body how to make different proteins. The instructions for making the insulin protein, for example, can be found from a sequence of 1425 base pairs.

A gene is simply a region of base pairs on DNA that code for a specific protein. Through variations in the base pair sequence (either missing, extra or substituted with a different pair), mutations in the gene can occur. These variations contribute to each person’s unique features. While many genes provide instructions to make proteins, some sequences of DNA don’t seem to have any function (that we know of). We still have a long way to go to fully understand our genetic code!

Within each of our cells, we have compact structures called chromosomes which are made up of genes, which in turn are composed of two long strands of nucleotides that are our DNA.
(Mayo Clinic, 2020)

How Genome-Wide Association Studies Work

Genetics is a branch of biological sciences that studies genes, genetic variation and heredity in organisms. Inheritance consists of the passing of information via genes from parents to offspring. This is a complicated study area because DNA-based organisms have thousands of genes, and each gene can be different in every individual. This can determine their interactions with other genes and traits, resulting in many entangled relationships.

As mentioned above, genes can also mutate create new traits. These mutations can be beneficial, but altering the code generally leads to detrimental outcomes, which can lead to diseases that medical genetics seeks to identify and understand. When looking for an unknown gene that could be involved in a disease, researchers use different methods to map out the set of genetic material present in an organism.

Searching the Genome for Clues

Genome wide association studies (GWAS) works by studying the entire set of genes (known as a genome) in a species, linking groups of genes to diseases or biological functions. Remember that our genome consists of 3 billion base pairs and a multitude of variations within them! Handling such massive amounts of data has only recently been possible thanks to the development of powerful computers, allowing researchers to compare millions of data points within a reasonable time frame.

This method takes hundreds of thousands to millions of genetic variants obtained from the genomes of many individuals and tests them to identify the already mentioned associations. The method requires not only the understanding of gene inheritance but also an understanding of statistics, as it involves looking for patterns in large amounts of data that need to be properly interpreted by the researcher to obtain useful outcomes.

This diagram shows an overview of the genome-wide association studies used to detect associations, from choosing what to study to obtain a “map” with the possible genes of interest. (Tam et al., 2019)

GWAS: How it Works in 4 Steps

The steps of GWAS are as follows:

  1. Identification of the disease or trait to be studied and selection of an appropriate population to perform the study.
  2. Characterization of the genetic variants to be able to move from the statistical association to the identification of those variants and genes that are causing the disease. Note that even though a disease is sometimes associated with a single gene, it is much more common to find diseases that are regulated by the interactions between multiple genes.
  3. The selected genes can be confirmed using different experimental approaches involving cell-based systems or model organisms.
  4. Genetic variants are extremely common but rarely show up as outward differences in an individual. In our analysis, we can expect to find variants that are either rare but make little to no difference, or common variants with a large effect on the individual.

The Effectiveness of GWAS

The use of GWAS can help us discover new biological mechanisms. GWAS loci are usually of unknown functions or relevance, and the experimental procedures that take place after the identification of the genetic variant using computer algorithms are helpful in discovering biological pathways and processes that regulate a disease.

A good example is the use of GWAS to assess mood instability, using the UK Biobank database as a source of information. This database stores the health information of 500,000 participants, including their medical histories. Several disorders were studied, including major depressive disorder (MDD), bipolar disorder (BD), schizophrenia, attention deficit hyperactivity disorder (ADHD), anxiety and post-traumatic stress disorder (PTSD). After applying the methodology explained above, the study managed to identify four loci that seem to play a role in mood instability.

GWAS has also been successful in identifying risk loci (fixed positions on chromosomes where a gene variation responsible for a disease is located) for diseases such as anorexia nervosa, major depressive disorder, type 2 diabetes and many others.

What does this mean? The biggest conclusion is that there is a polygenic basis for mood instability, which implies that mood issues are not due to a single genetic mutation, but are related to genes that we could have never identified without GWAS. As computational capabilities continue to improve alongside technological growth, GWAS will play an increasingly important role in our understanding of genetic diseases. In the future, methods such as these can be combined with gene therapy and other biologic drugs to cure currently incurable genetic diseases.


  1. National Human Genome Research Institute. (n.d.). Base Pair. https://www.genome.gov/genetics-glossary/Base-Pair
  2. Griffiths, A. J. F., Miller, J. H., Suzuki, D. T., Lewontin, R. C., & Gelbart, W. M. (2000). Genetics and the organism: introduction. In An introduction to genetic analysis (7th ed). New York, NY: W. H. Freeman.
  3. Manolio, T. A. (2010), Genome-wide association studies and assessment of the risk of disease. The New England Journal of Medicine, 363(2), 166-76.
  4. Mayo Clinic. (2019). How genetic disorders are inherited. https://www.mayoclinic.org/tests-procedures/genetic-testing/multimedia/genetic-disorders/sls-20076216
  5. Tam, V., Patel, N., Turcotte, M., Bossé, Y., Paré, G., & Meyre, D. (2019). Benefits and limitations of genome-wide association studies. Nature Reviews, 20(8), 467-484.
  6. Genetics Home Reference. (2020). What is a gene? https://ghr.nlm.nih.gov/primer/basics/gene
  7. Wood, E. J. (1995). The encyclopedia of molecular biology. Biochemical Education, 23(2), 1165.

About the Author

alejandra science writer
Alejandra Rodriguez Sosa

Alejandra was a science writer at FTLOScience from October 2018 to April 2021.

You Might Also Like…

Go to Top