Insights from Large-Scale Studies of Diverse Phenotypes and Populations


Back to Package

  • Register
    • Regular Member - Free!
    • Early Career Member - Free!
    • Resident/Clinical Fellow Member - Free!
    • Postdoctoral Fellow Member - Free!
    • Graduate Student Member - Free!
    • Undergraduate Student Member - Free!
    • Emeritus Member - Free!
    • Life Member - Free!
    • Trainee Member - Free!

Platform sessions are abstract-driven sessions with 6 talks per session. These talks are 10 minutes in length and are cross-topical in nature to represent the broad discipline our field of genetics and genomics represent. After each talk, there will be a 5-minute Q&A with each speaker. For more information, see the Details tab.

Recorded session from the 2021 virtual meeting.

Key:

Complete
Failed
Available
Locked
Video
Open to view video.
Open to view video. Session Recording

A global biobank study of asthma identifies novel associations, illuminates shared genetic architecture, and improves polygenic prediction across diverse ancestry groups

Asthma is a complex and multifactorial disease that affects millions of people worldwide and varies in prevalence by an order of magnitude across geographic regions and diverse populations. However, the extent to which genetic variation contributes to these disparities is unclear, as studies probing the genetics of asthma have been primarily limited to populations of European descent. To expand our understanding of the genetic factors underlying asthma risk in different ancestral populations, we conducted the largest genome-wide association study of asthma to date (N cases=153,763 and N controls=1,647,022) via meta-analysis across 18 biobanks with harmonized phenotype definitions and spanning multiple countries and genetic ancestries, collectively called the Global Biobank Meta-analysis Initiative (GBMI). This meta-analysis discovered 180 independent genome-wide significant loci (p < 5e-8) associated with asthma, 69 of which are novel. We replicate well-known associations such as TNFRSF8 and IL1RL1, and find that the novel associations tend to have smaller effects than previously-discovered loci, highlighting our substantial increase in effective sample size and statistical power. Despite the considerable range in prevalence among biobanks, from 3% to 24%, the genetic effects of associated loci are largely consistent across biobanks, ancestries, biobank ascertainment-types, and asthma definitions. This offers insight into the potential shared biological pathways that may be differentially affected by environmental factors and contribute to variation in prevalence. To further probe the polygenic architecture of asthma, we are constructing polygenic risk scores (PRS) using multi-ancestry approaches to establish a baseline understanding of PRS performance for asthma in different populations. The vast increase in the scale and diversity of GBMI yields higher predictive power for asthma across the board; for example, with the multi-ancestry GBMI cohort, we achieve .03 phenotypic variance explained in East Asian populations compared to the highest previously reported variance explained in an East Asian population, .0075. The availability of additional phenotypic information on asthma subtypes and asthma-related diseases like COPD in GBMI-participating biobanks will allow us to further tease apart the genetics underlying various aspects of the disease. In summary, we have identified novel loci associated with asthma, found remarkable consistency of genetic effects despite enormous heterogeneity in prevalence, and have quantified the relative contribution of polygenic components to asthma risk.

Kristin Tsuo
Department of Genetics, Harvard Medical School


Genome-wide polygenic risk score of prostate cancer in African and European ancestry men
Genome-wide polygenic risk scores (PRS) are reported to have higher performance than standard genome-wide significant PRS across numerous traits. We evaluated the ability of genome-wide PRS to evaluate prostate cancer risk compared to our recently developed and highly predictive multi-ancestry PRS of 269 established prostate cancer risk variants. Genome-wide PRS approaches included LDpred2, PRS-CSx, and EB-PRS. Models were trained using the largest and most diverse prostate cancer GWAS to date of 107,247 cases and 127,006 controls, which was previously used to develop the multi-ancestry PRS of 269 variants. Resulting models were tested in independent samples of 1,586 cases and 1,047 controls of African ancestry from the California Uganda Study and 8,045 cases and 191,835 controls of European ancestry from the UK Biobank. Among the genome-wide PRS approaches, LDpred2 had the best performance, with AUCs of 0.649 (95% CI=0.627-0.670) in African and 0.819 (95% CI=0.815-0.823) in European ancestry men. African and European ancestry men in the top PRS decile relative to men in the median 40-60% PRS category had odds of prostate cancer of 3.29 (95% CI=2.47-4.40) and 2.99 (95% CI=2.78-3.23), respectively. However, the PRS constructed using 269 variants had significantly larger AUCs in both African (0.679, 95% CI=0.659-0.700) and European ancestry men (0.845, 95% CI=0.841-0.849), with African and European ancestry men in the top PRS decile having larger odds of prostate cancer (3.53, 95% CI=2.66-4.69 and 4.20, 95% CI=3.89-4.53, respectively). We are currently further validating these findings in diverse men from Million Veteran’s Program. This investigation suggests that genome-wide PRS may not improve the ability to distinguish prostate cancer compared to a genome-wide significant PRS.

Burcu F Darst
University of Southern California

Genetic association of phenotypes derived by self-supervised deep learning of retina fundus images reveals new genes for eye development
Although genome-wide association studies (GWAS) have achieved great success and identified thousands of genetic associations, phenotypes of most existing GWAS studies are predefined. While these phenotypes encode valuable biomedical knowledge, they are also biased by current clinical practice and epidemiological studies. Also as phenotype code is greatly simplified, it is often not sufficient to capture the complexity of human physiology and pathology in their entirety. Fortunately, with the medical record becoming increasingly digitized, there are new opportunities to derive phenotypes beyond expert-curation, which would avoid human bias and discover new phenotypes that are previously missed. Here, leveraging breakthroughs in self-supervised deep representation learning, we propose a new approach for phenotype discovery from medical images. We use a contrastive loss function over an Inception V3 architecture to learn a representation that captures the inherent image features of individuals. Using vessel segmentation masks generated from retina fundus images as inputs, we designed a phenotyper neural network model that generates 128 phenotypes representing retinal vasculature. After training on 40,000 images from EyePACS, our model generated phenotypes from 130,967 images of 65,629 British White participants in the UK Biobank. A GWAS of these vasculature phenotypes identified 34 independent loci, at least 5 are associated with vessel features. Mouse knockout experiments verified the role of the WNT7B gene, a newly found locus, in retinal vessel development. Our results establish a new framework of unsupervised image-based genome-wide genotype phenotype association studies (iGWAS). Our framework would expand the repertoire of GWAS phenotypes and enable discovery of new biology.

Ziqian Xie
Baylor College of Medicine

Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases
Genome-wide association studies (GWAS) of human complex traits or diseases often implicate genetic loci that span hundreds of significant genetic variants. However, these loci may only contain one or a handful of causal variants. Statistical fine-mapping refines a GWAS locus to a smaller set of likely causal variants (credible set). Since non-causal variants have marginally different effects across populations where LD differs, capitalizing on the genomic diversity across ancestries holds the promise to further improve the resolution of fine-mapping. However, to date, cross-population fine-mapping efforts have been limited, partly due to the lack of statistical methods that can appropriately integrate data from multiple ancestries. Building on Sum of Single Effects (SuSiE), a single-population fine-mapping model, we have developed SuSiEx, an accurate and computationally efficient method for trans-ancestry fine-mapping. SuSiEx assumes that causal variants are largely shared across populations while allowing for varying variant effect sizes across populations. Our model can integrate data from an arbitrary number of ancestries, explicitly models population-specific LD patterns, accounts for multiple causal variants in a genomic region, and can be applied to GWAS summary statistics without access to individual-level data. We showed, via simulation studies, that compared with fine-mapping 100K European samples, integrating 50K European and 50K African samples using SuSiEx enabled fine-mapping of more association signals, and dramatically increased the resolution of credible sets. Comparing with PAINTOR, SuSiEx had a 37% reduction in the median size of credible sets and a 54% increase in the number of high Posterior Inclusion Probability (PIP) variants. We applied SuSiEx to 25 quantitative traits that are available from both the Taiwan Biobank (TWB, n = 92,615) and UK Biobank (UKBB, n = 361,194) to fine-map genetic loci reaching genome-wide significance. Compared with single-population fine-mapping in UKBB, cross-ancestry fine-mapping significantly reduced the size of credible sets and increased the PIP of the most probable variant. We additionally applied our method to schizophrenia GWAS summary statistics of East Asian and European ancestries. Compared with the published fine-mapping results from PGC using FINEMAP on the same data, SuSiEx reduced the size of credible sets in 70% of the fine-mapped loci. Manual inspection confirmed that SuSiEx provided more sensible results in many loci. As the accumulation of GWAS results from different ancestries, the application of our method will be much promising.

Kai Yuan
Analytic and Translational Genetics Unit, Massachusetts General Hospital

Analysis across Taiwan Biobank, Biobank Japan, and UK Biobank identifies hundreds of novel loci for 36 quantitative traits
Genome-wide association studies (GWAS) have identified tens of thousands of genetic loci associated with human complex traits and diseases. However, the majority of GWAS were conducted in individuals of European (EUR) ancestry. Failure to capture global genetic diversity has limited biological discovery and impeded equitable delivery of genomic knowledge to diverse populations. Here we performed genome-wide analysis on 102,900 individuals across 36 human quantitative traits in the Taiwan Biobank (TWB), a major biobank effort that broadens the population diversity of genetic studies in East Asia (EAS). We identified 1,907 independent genome-wide significant loci (P-value < 5x10-8) across the 36 traits, among which 1,287 loci survived Bonferroni correction for the number of traits tested (P-value < 5x10-8/36). The number of genome-wide significant loci per trait ranged from 1 for forced expiratory volume in one second (FEV1) and FEV1 to forced vital capacity (FVC) ratio (FEV1R), to 211 for height (HT). We estimated the SNP-based heritability (h2g) for each trait, which ranged from 0.009 (FEV1R) to 0.384 (HT), and pairwise genetic correlations between these traits, which identified clusters of highly genetically correlated traits. Of the 1,907 genome-wide significant loci, 1,615 were fine-mapped to a total of 1,972 credible sets, each representing an independent association signal. Out of the 1,972 credible sets, 232 were mapped to a single variant with posterior inclusion probability (PIP) > 95%, among which 24 were missense variants. Leveraging GWAS summary statistics from Biobank Japan (BBJ) and UK Biobank (UKBB), we found that the genetic architecture of the quantitative traits examined was largely consistent within EAS and between EAS and EUR populations. Integrating TWB and BBJ GWAS identified a total of 2,975 genetic loci, among which 979 had not been reported in previous biobank studies. We also examined whether polygenic risk scores (PRS) of biomarkers can be used to predict the risk of common complex disease, and demonstrated the potential utility of biomarker GWAS in predicting disease risk (e.g., type 2 diabetes) and the promise of multi-trait cross-population polygenic prediction. Our novel findings represent a major advance in diversifying GWAS samples and the characterization of the genetic architecture of human complex traits in EAS populations. Future endeavors on increasing the sample size and phenotype coverage in TWB, and improving cross-biobank data harmonization will further facilitate genomic discovery.

Yen-Feng Lin
National Health Research Institutes

The effects of demographic-based selection bias on GWAS results in the UK Biobank
Genome-wide association studies (GWASs) are almost always based on a non-random sample of the underlying population, as obtaining very large sample sizes, rather than ensuring such samples are representative, has been key to their success. Selection bias in estimated genetic associations, including how it varies across traits, is poorly understood. A sample of particular interest is the widely used UK Biobank (UKB). Because of the need for very large samples, the UKB is included in almost all large GWASs as one of the largest cohorts. In addition, UKB's subsample of genotyped siblings (UKBSIB) has become a crucial resource for estimating genetic effects free of environmental confounding. Using nationally representative UK Census microdata as a reference, we document substantial non-random selection into the UKB, and even stronger for UKBSIB: individuals in the UKB and UKBSIB are more likely to be female, higher educated, and older, compared to the underlying population that received an invitation. We also show that this non-random selection leads to significant selection bias in associations between various demographic and health-related traits estimated in the UKB. We then estimate probabilities of UKB participation for each UKB participant to estimate selection-corrected GWASs for multiple traits using inverse probability weighting. Based on preliminary analyses for the top 5,000 SNPs associated with BMI, education, and height, respectively, we show that the extent to which selection-corrected GWAS results differ from those of regular GWASs is trait-specific. Genetic associations for educational attainment and BMI are the most altered after correcting for volunteer bias, whereas associations for height remain relatively unaffected. For educational attainment, 12.6% of our estimated SNP effects flip sign after correcting for selection bias, suggesting that current GWAS methods are not sufficiently robust. We will extend these analyses by investigating more phenotypes, conducting regular and inverse probability weighted GWASs in the UKB that incorporate all available SNPs, and comparing results. Our findings will be useful for understanding the extent to which a particular phenotype is prone to selection bias in GWAS, and our correction method provides an alternative when population-representative cohorts are not available.

Sjoerd van Alten
Vrije Universiteit Amsterdam