ASHG 2021 Sessions

  • Register
    • Regular Member - Free!
    • Early Career Member - Free!
    • Resident/Clinical Fellow Member - Free!
    • Postdoctoral Fellow Member - Free!
    • Graduate Student Member - Free!
    • Undergraduate Student Member - Free!
    • Emeritus Member - Free!
    • Life Member - Free!
    • Trainee Member - Free!

The ASHG 2021 Annual Meeting was held virtually from October 18-22 2021. In this package, you can view member recommended sessions from the meeting.

For more information on the 2021 Annual Meeting, visit our website

  • Contains 1 Component(s)

    Speakers provide an in-depth look at the newly resolved regions of human genome.

    Since the initial release of the human genome sequence 20 years ago, human chromosomes have remained unfinished due to large regions of highly identical repeats clustered within centromeres, regions of segmental duplication, and the acrocentric short arms of chromosomes. However, recent advances in long-read sequencing technologies and associated algorithms have now made it possible to systematically assemble these regions from native DNA for the first time. 

    In this session, we will present the first complete sequence of a human genome and provide an in-depth look at the newly resolved regions, their variation across individuals, and the resulting impact on human health, disease, and evolution. Our first speaker is a co-lead of the Telomere-to-Telomere (T2T) Consortium (https://sites.google.com/ucsc.edu/t2tworkinggroup), and he will introduce the session by unveiling the complete human genome and explaining the efforts to sequence, assemble, and validate the genome assembly. Our second speaker will present the genetic and epigenetic maps of all human centromeric regions and discuss their evolution across the hominid phylogeny over the last 25 million years. Our third speaker will focus on the segmental duplications found within the genome and discuss their transcriptional and epigenetic status. Finally, our fourth speaker will present the human methylome, with a particular focus on the epigenetic profile of newly resolved regions. At the end of the session, we will host a panel discussion to allow for a Q&A between the audience and each of our four speakers.

    Recorded session from the 2021 virtual meeting.

  • Contains 1 Component(s)

    This panel will examine workforce diversity initiatives and practices that aim to redress inequities.

    A resounding call for increased workforce diversity has been made in the genomics research community in recent years (Green et al. 2020; Channaoui et al. 2020). Recognizing the lack of diversity in both research participants and in the genomics workforce, workforce diversity initiatives strive to train and retain diverse members of the scientific community such that scientific fields are more inclusive and better represent racial, ethnic, sexual, gender minority, and differently abled groups. The NHGRI 2020 Strategic Vision, for instance, articulates how building a diverse genomics workforce will be a key priority “to promote workforce diversity, leadership in the field, and inclusion practices.” A broad literature demonstrates the lack of diversity among NIH funded investigators, even as research has demonstrated that researchers from underrepresented groups develop novel scientific projects at higher rates (Hofstra et al. 2020). 

    This panel will examine workforce diversity initiatives and practices that aim to redress inequities that have excluded underrepresented and communities of color from the genomics leadership and workforce more broadly. Drawing on empirical cases and the experiences and perspectives of researchers and program leaders on initiatives aimed at increasing diversity and inclusion in the field, this panel will discuss how definitions of diversity, commitments to diverse experiences, distribution of resources and infrastructures, and professional networks directly impact equitable diversification of the workforce. This panel will consider how an equity framework can be brought to bear on questions of what workforce diversity efforts can and should accomplish, who should be responsible for such initiatives, and what sustainable/lasting commitment to workforce diversity means for the genomics community moving forward.

    Recorded session from the 2021 virtual meeting.

  • Contains 1 Component(s)

    Findings from ELSI studies examining returning clinical PRS across diverse populations.

    Many polygenic risk scores (PRS) have been published with an eye towards clinical implementation. However, little work has been done on the social and ethical considerations of calculating and returning PRS, particularly across genetic ancestral backgrounds. 

    This session reports findings from embedded ELSI studies examining social and ethical considerations of returning clinical PRS across diverse populations. The panel will advance our understanding of critical issues that must be addressed to maximize potential benefits of clinical PRS. Following an introduction to the topic, Maya Sabatello describes the views of patients, clinicians and IRB members about challenges translating PRS research into improved care and strategies to promote health equity. Broadening our understanding of variation in stakeholder views, Sabrina Suckiel highlights English- and Spanish- speaking patients’ perceptions of clinical utility of PRS, preferences regarding return of information and potential barriers to uptake. 

    The format of PRS results can impact patient and provider understanding of risk and responsiveness to corresponding recommendations. Anna Lewis discusses research on stakeholder preferences regarding various formats of return and the potential impacts on use and understanding. Finally, Ellen Clayton presents data on the role of patient education to ensure researchers understand racial/ethnic minority views on clinical PRS. Following discussion, closing remarks will highlight the utility of embedded ELSI projects within large-scale PRS or genomic studies and offer recommendations for future research. These studies, embedded in the Electronic Medical Records and Genomics (eMERGE) IV Network, were designed to inform return of actionable PRS for common complex diseases to patients and their healthcare providers.

    Recorded session from the 2021 virtual meeting. 

  • Contains 1 Component(s) Recorded On: 10/20/2021

    This session will discuss ethnically diverse populations and health equity in genomic medicine.

    This session will focus on opportunities, challenges and ethical issues related to studying ethnically diverse populations to improve discovery and health equity in genomic medicine. Here we gather scientists from across the globe with experience in conducting genomic research in populations under-represented in human genetics research. 

    Speakers will address research on genetic risk factors for medical phenotypes of particular relevance to the study populations. We will also discuss ethical, social, and legal issues (ELSI) that arise when conducting genomic research in Indigenous communities, ways in which we can achieve more inclusive and equitable research, and ensure benefit sharing. We will have four 15-minute presentations followed by a 30 minute panel discussion. 

    Our session will start with a presentation discussing studies of pharmacogenetic variation in Indigenous peoples from South America and implications for personalized medicine in these populations. Our second speaker will describe results of a multi ethnic genome wide association study (GWAS) of Differentiated Thyroid Cancer (DTC) in Melanesians from New Caledonia and Polynesians from French Polynesian, two populations with the highest incidence of DTC worldwide. The speaker will illustrate the impact of genetic studies of DTC risk on community health in Oceanian populations. The next speaker will follow on the promising future for genetic discovery that can be achieved by studying African populations that have high levels of genomic and phenotypic diversity. This speaker will also illustrate how the study of ethnically diverse African populations has shed light on the genetic basis of hearing impairment, resulting in identification of multiple novel genes influencing hearing loss. Our last speaker will discuss ethical perspectives and the challenges of conducting genomic research in Indigenous populations from North America, the potential benefit for personalized medicine, and the importance of creating a partnership with Indigenous communities. 

    The panel discussion, which will include the two moderators and audience participation, will focus on how studies of ethnically diverse populations are of benefit to the global medical genetics community. We will further discuss ethical issue that arise from consequences of research that stigmatizes Indigenous communities and will touch base on principals of how to conduct research in minority and Indigenous populations in an ethical manner.

    Recorded session from the 2021 virtual meeting.

  • Contains 1 Component(s)

    This session will introduce recent efforts to level ancestry imbalance in genomic research.

    The success of genome-wide association studies (GWAS) in humans have yielded a wealth of clues about the molecular basis of many common human diseases. In addition, polygenic risk scores (PRS) for a variety of traits are increasingly becoming accurate enough to be useful for clinical practice, realizing the longstanding goal of personalized medicine. However, data collection continues to be predominantly imbalanced towards individuals of European ancestry, and it is abundantly clear that methods developed in one human ancestry group do not perform well in other ancestry groups, limiting their utility and exacerbating already severe health disparities. The speakers in this session will introduce recent efforts to level ancestry imbalance in genomic research, including the formation of large collaborative efforts and the development of novel statistical methods.

    Recorded session from the 2021 virtual meeting.

  • Contains 1 Component(s)

    Speakers discuss genomics in Africa.

    Platform sessions are abstract driven sessions with 6 talks per session. These talks are 10 minutes in length and are cross-topical in nature to represent the broad discipline our field of genetics and genomics represent. After each talk, there will be a 5-minute Q&A with each speaker. For information on each individual session, please view the "Details" tab. 

    Recorded session from the 2021 virtual meeting.

    Revisiting the out of Africa event with a deep learning approach

    Anatomically modern humans evolved around 300 thousand years ago in Africa. Modern humans started to appear in the fossil record outside of Africa about 100 thousand years ago though other hominins existed throughout Eurasia much earlier. Recently, several researchers argued in favourof a single out of Africa event for modern humans based on whole-genome sequences analyses. However, the single out of Africa model is in contrast with some of the findings from fossil records, which supports two out of Africa, and uniparental data, which proposes a back to Africa movement. Here, we used a deep learning approach coupled with Approximate Bayesian Computation and Sequential Monte Carlo to revisit these hypotheses from the whole genome sequence perspective. Our results support the back to Africa model over other alternatives. We estimated that there are two successive splits between Africa and out of African populations happening around 60-90 thousand years ago and separated by 13-15 thousand years. One of the populations resulting from the more recent split has to a large extent replaced the older West African population while the other one has founded the out of Africa populations.

    Mayukh Mondal
    Institute of Genomics, University of Tartu

    An analysis of population copy number variation in sub-Saharan African genomes


    Introduction Copy number variation (CNV) is responsible for a large component of normal human variation and has been implicated in the cause/genetic aetiology of several rare diseases. Population reference databases containing CNV information from all global populations is critical in disease genetics research, but current resources lack diversity, especially from the African continent. This makes such databases of limited use in studies looking at genetic diseases in African individuals. This study therefore aims to address this knowledge gap by producing a map of CNV using whole-genome data from several, previously unstudied African populations
    Methods 1027 high coverage whole genome sequences obtained from individuals across west, central, southern and east Africa, were analysed using Manta and Graphtyper2. Additionally, 919 of the samples were also analysed using Genome STRiP to detect multi-allelic CNV. Quality control specific to each tool was performed in order to achieve high quality variant call sets.
    Results 56 816 CNVs were detected by the Manta pipeline, consisting of 44 671 deletions and 12 145 duplications. Due the ability of Manta to detect small variants (<100bp), we are able to describe this previously less studied class of variants in an African cohort. 25% of the variants detected by Manta were <100 bp and 40% of these were common variants at >5% allele frequency. 50% of these variants are novel compared to 27% of the remaining variants >100bp. Overall, 32% of the variants identified were novel. A comparison between central, west, east and southern African regions yielded a number of variants unique to each region. We find deletions tend to have lower allele frequencies compared to duplications. The majority of variants were found in the non-coding genome, with only 8% of variants overlapping coding transcripts. An additional 5% of variants overlapped regulatory features. Genome STRiP detected 3991 multi-allelic variants with 99% having a copy number between 3-20. There were also variants with copy numbers greater than 20, some of which appear to be incidences of excessive runaway duplications not previously described.
    Conclusion The amount of novel variation found demonstrates the importance of including African individuals from multiple African regions when producing reference databases and the rich genomic diversity of African genomes. Work is currently being performed to combine the full Genome STRiP and Manta call sets to produce a robust combined dataset. The variant database produced in this study will provide a valuable resource as a reference of normal CNV for the study of diseases in African populations.
    Emma Wiener

    Division of Human Genetics, National Health Laboratory Service & School of Pathology, Faculty of Health Sciences, University of Witwatersrand

    Integrative genomic analyses identify key interethnic differences in immune response to malaria

    Host responses to infection with the malaria parasite P. falciparum vary between individuals for reasons that are poorly understood. Here we reveal metabolic perturbations as a consequence of malaria infection in children and identify an immunosuppressive role of endogenous steroid production in the context of P. falciparum infection. We perform metabolomics on matched samples from children from two ethnic groups in West Africa, before and after infection with seasonal malaria. Analyzing 306 global metabolomes we identify 92 parasitemia-associated metabolites with impact on the host adaptive immune response. Integrative metabolomic-transcriptomic and causal mediation- moderation analyses reveal an infection-driven immunosuppressive role of parasitemia-associated pregnenolone steroids on lymphocyte function and the expression of key immunoregulatory lymphocyte genes in the Gouin ethnic group. In children from the less malaria-susceptible Fulani ethnic group we observe opposing responses upon infection, consistent with the immunosuppressive role of endogenous steroids in malaria. These findings advance our understanding of P. falciparum pathogenesis in humans and identify potential new targets for antimalarial therapeutic interventions. 

    Youssef Idaghdour
    New York University Abu Dhabi


    GWAS of complex traits in a multi-population African cohort

    The diversity among present-day African populations is the result of a deep and complex history of admixture, migrations, and regional adaptations to local environments and diseases. Little is known about the impact of this evolutionary history on the genetics underlying complex traits. Here I present recent work on genetic associations for a panel of anthropometric, cardiovascular, and metabolic biomarker measurements paired with dense genotyping data. For some traits, the variation among populations is expected to reflect local adaptations, such as short stature in western Cogo rainforest hunter-gatherers. The study cohort of several thousand individuals is drawn from an ancestrally diverse set of populations from western, eastern, and southern sub-Saharan Africa. Populations include current or recent hunter-gatherers, traditional agriculturalists, and semi-nomadic pastoralists, from rural regions of Cameroon, Nigeria, Ethiopia, Kenya, Tanzania, and Botswana. For many of these traits, this marks the first genotype/phenotype analysis to include these ethnic groups. The high degree of population structure presents both challenges and opportunities for genetic analysis. Genetic structure analysis indicates genetic clustering by geographic location, language family, and regional hunter-gatherer lineages. Examples include the hunter-gathers from the Serengeti, western Congo, and Kalahari, and clusters that correlate with Niger-Congo, Afroasiatic, and Nilo-Saharan language families. We observe substantial population-level variation for many traits, such as height, skin pigmentation, and blood pressure. The proportion of the trait variance that is due to the genetic population structure varies by trait and tends to be greater for anthropometric traits like height and skin pigmentation than for metabolic biomarkers like LDL. From genotype/phenotype association tests we find numerous independent associations at genome-wide significance for several traits, including circulating triglyceride levels and BMI. The population structure of the total additive genetic effects is also examined. European GWAS associations replicate poorly in this African cohort, while associations discovered in the African cohort show comparatively better replication in Europeans. 

    Matthew Hansen
    Univ Pennsylvania

    Genotype-by-infection interactions: Single cell RNASeq profiling of in-vivo host immune response to malaria reveals cell type and infection-specific eQTLs

    The disease burden of malaria remains a significant global public health challenge. Plasmodium falciparum is responsible for more than 99% of malaria cases in Africa and for >400,000/year malaria-related deaths worldwide. Inter-individual differences in susceptibility to malaria is multifactorial and has a significant heritable component but our understanding of the effect of infection on gene regulation of immune response at the transcriptional remains very limited. Here we use longitudinal matched sampling, single cell RNAseq profiling of PBMCs and whole-genome sequencing data of malarial children before and after natural P. falciparum infection in Banfora, Burkina Faso, West Africa. In total, we generated ~90,000 single cell RNASeq profiles and identified PBMC cell types affected by infection. Single cell RNASeq eQTL analysis revealed cell type specific eQTLs and genome-wide significant genotype-by-infection interaction effects implicating key immune genes. These results provide the first genome-wide picture of host in vivo regulatory variation events in malaria at the single cell level and highlight the implication of regulatory interaction effects in modulating host immune response in-vivo. 

    Odmaa Bayaraa
    New York University Abu Dhabi


    Returning secondary genetic findings: Provider perspective in Africa

    Objective: Previous research has shown that lack of resources and knowledge significantly impact the return of genomic test results. However, not much is known about the level of expertise and knowledge of clinicians providing cleft care in Africa on genetic diseases, despite the vast genetic diversity in this population.
    Methods: Providers in participating cleft-craniofacial clinics in Ethiopia, Ghana, and Nigeria were sent the link to a 63-question online survey. This survey assessed the providers' experience with genetic testing, genetics education and return of genetic results, provider knowledge, clinician comfort with returning results, available resources to assist with genomic findings, and potential barriers.
    Results: As of June 2nd, 2021, 246 providers completed the survey. Only 2% had been involved in the delivery of Exome or Genome sequencing; 78.6% had no formal genetic education, 49.6% agreed that all secondary findings should be disclosed to patients. Regarding the comfort level, 89.4% were somewhat to extremely comfortable discussing genetic risk factors with patients, and 81.8% were somewhat to extremely comfortable with returning genetic results. Sixty-three percent believed that resources were currently available to enable them to access needed genetic information.
    Conclusion: Providers were aware that genetic testing could help in the clinical management of diseases from the returned responses. However, the lack of knowledge about genomic medicine, uncertain clinical utility, and lack of available resources were cited as barriers that significantly impacted incorporating genetic testing into their practice. Data collection is ongoing and will continue till July 31st, 2021. This is the first Ethical, Legal, and Social Implications (ELSI) study to document the knowledge and comfort level of cleft providers in Africa. This study will help determine the most beneficial information to equip providers with the return of secondary genetic findings. 

    Abimbola Oladayo
    University of Iowa

  • Contains 1 Component(s)

    Speakers discuss diversifying data, diagnostics, and treatment options for genetic disease.

    Platform sessions are abstract driven sessions with 6 talks per session. These talks are 10 minutes in length and are cross-topical in nature to represent the broad discipline our field of genetics and genomics represent. After each talk, there will be a 5-minute Q&A with each speaker. For information on each individual session, please view the "Details" tab. 

    Recorded session from the 2021 virtual meeting.

    Long-Term systemic expression and cross-correction ability of HMI-203: Investigational gene therapy candidate for mucopolysaccharidosis type II or Hunter Syndrome

    Mucopolysaccharidosis type II (MPS II), or Hunter syndrome, is a rare X-linked lysosomal storage disorder caused by mutations in the iduronate-2-sulfatase (IDS) gene, resulting in loss of I2S activity leading to systemic (peripheral organs and central nervous system (CNS)) toxic lysosomal accumulation of glycosaminoglycans (GAGs). GAGs are large polysaccharides made of repeating disaccharide units responsible for providing structure and hydration to the cell. The disease results in skeletal dysplasia, joint stiffness, organomegaly, airway obstruction and, in severe cases, neurocognitive deficits. Hunter syndrome occurs in approximately 1 in 100,000 to 1 in 170,000 males, and causes significantly reduced lifespan, with the severe form leading to life expectancy of 10 to 20 years The proposed therapeutic mechanism of gene therapy candidate HMI‑203 is based on both intracellular expression and synthesis of active I2S, as well as high levels of expression and secretion of active I2S enzyme to support cross correction. Herein, we report preclinical data where a single intravenous dose of HMI-203 delivering human IDS via a rAAVHSC vector in the MPS II murine model resulted in dose-dependent and long-term transduction, IDS expression and I2S enzymatic activity in the evaluated tissues, e.g., liver, brain and serum through 52 weeks post-dose. A significant correlation was observed between liver and serum I2S activity, suggesting that the liver was likely the major contributor to the elevated levels of active I2S in the serum. The circulating I2S protein in the serum was functionally active (i.e., 90 kDa form) and cross-correction activity via a mannose-6-phosphate receptor dependent pathway was demonstrated using an in vitro competition assay. The robust and broad IDS tissue expression, along with demonstrated cross-correction significantly reduced GAG heparan sulfate (GAG-HS) to wild type (WT) levels in all evaluated organs associated with the disease, cerebrospinal fluid (CSF) and urine. In addition, lysosomal-associated membrane protein-1 (LAMP1) levels were significantly reduced to WT-like levels in the peripheral organs and CNS tissues. Of note, positive and significant correlations were observed between reduction in GAG-HS and LAMP1 levels in the CNS and brain and CSF GAG-HS levels, suggesting that CSF GAG-HS levels could be indicative of overall brain GAG and lysosomal burden levels in the clinic. Taken together, we have demonstrated that HMI-203 combines transduction and expression with the potential for cross-correction. These HMI-203 IND-enabling studies support HMI-203 as a gene therapy candidate for the treatment of MPS II. 

    Kruti Patel
    Homology Medicines, Inc.

    Tasimelteon Safely and Effectively Improves Sleep in Smith Magenis Syndrome: results from a Double-Blind Randomized Trial Followed by an Open-Label Extension

    Smith-Magenis Syndrome (SMS; OMIM #182290) is a rare genetic disorder that results from an interstitial deletion of 17p11.2 and, in rare cases, from a retinoic acid induced 1 (RAI1) gene variant (Slager et al 2003). Currently, the prevailing theory is that there is an underlying circadian pathophysiology causing sleep disturbances in these patients, as they exhibit low overall melatonin concentrations and abnormal timing of peak plasma melatonin concentrations. This abnormal inverted circadian rhythm is estimated to occur in 95% of individuals with SMS (Boone et al., 2011; Spruyt et al., 2016). To assess the efficacy of tasimelteon, a melatonin receptor agonist, to improve sleep in SMS, a 9-week, double-blind, randomized, two-period crossover study was conducted at four U.S. clinical centers. Genetically-confirmed SMS patients, aged 3 to 39, with sleep complaints participated in the study. Patients were assigned to treatment with tasimelteon or placebo in a 4-week crossover study with a one week washout between treatments. Eligible patients participated in an open label study and were followed for > 3 months. Improvement of sleep quality (DDSQ50) and total sleep time (DDTST50) on the worst 50% of nights were primary endpoints. Secondary measures included actigraphy and behavioral parameters. Over three years, fifty-two patients were screened and twenty-five patients completed the randomized portion of the study. DDSQ50 significantly improved over placebo (0.4, p=0.0139) and DDTST50 also improved (18.5 min, p=0.0556). Average sleep quality (0.3, p=0.0155) and actigraphy-based total sleep time (21.1 min, p=0.0134) improved significantly, consistent with the primary outcomes. Patients treated for ≥ 90 days in the open label study showed persistent efficacy. Adverse events were similar between placebo and tasimelteon. Tasimelteon safely and effectively improved sleep in SMS. The 17p11.2 deletion encompasses RAI1, leading to haploinsufficiency, which is considered the primary cause for most features of SMS, including dysregulation of the molecular clock via its effect on CLOCK expression. ChIP-Chip and reporter studies suggest that RAI1 binds, directly or in a complex, to the 1st intron of CLOCK, enhancing transcriptional activity, resulting in reduced CLOCK expression in SMS patient-derived cells (Williams et al 2012). The results of this study suggest that treatment with a the circadian regulator can, in part, ameliorate the circadian deficiencies caused by RAI1 haploinsufficiency, providing further evidence of a critical role for RAI1 in the regulation of circadian rhythms. 

    Christos Polymeropoulos
    Vanda Pharmaceuticals Inc.

    Unravelling African genomes: Whole-genome sequencing of 1000 Nigerian samples spanning 50 tribal groups provides new insights into diversity and admixture

    The lack of adequate representation of diverse genomes in human genomics research may limit insights that can be made about variants influencing disease susceptibility and trait variability across populations. We are helping to address this gap by performing germline whole genome sequencing of a Nigerian cohort. Nigeria represents one of the most diverse and populous regions on earth, with a population of over 200 million and over 250 unique tribal groups. We coordinate data generation in Lagos with analysis by staff around the world by leveraging cloud resources and deploying a scalable, robust, portable pipeline for alignment and variant calling. We present results from an initial round of whole-genome sequencing of ~1000 subjects from 50 tribal groups in Nigeria. We describe patterns of variation across tribes including variants of different functional classes and frequencies. We survey patterns of autozygosity across groups and compare these to 1000 Genomes samples. We highlight genetic distances between tribes and reveal evidence of admixture with European and northern African populations. We compare frequencies within our dataset to those reported in publicly available data (e.g. 1000 Genomes) for specific loci of clinical utility, e.g. those associated with drug response, highlighting noteworthy differences. Lastly, we find widespread, tribe-specific differences in allele frequency for medically-relevant variation, underscoring the importance of variant discovery and replication in non-European ancestry cohorts. Our results add to the growing body of genomic data from diverse populations, investigating understudied groups and the unique opportunities for discovery that they represent. We highlight opportunities for precision medicine, and reveal insights about variants of most clinical importance within and between human populations. 

    Colm O'Dushlaine
    54gene

    NOTCH3 p.Arg1231Cys is Present in 1 in 92 Pakistani and Associated with Stroke

    Cerebral Autosomal Dominant Arteriopathy with Subcortical Infarcts and Leukoencephalopathy (CADASIL) is an autosomal dominant Mendelian disorder characterized by early onset of migraine with aura, recurrent stroke, and dementia. Pathogenic CADASIL variants either add or remove a cysteine (Cys+/-) residue in one of 34 epidermal growth factor like repeats (EGFR) in the extra-cellular domain (ECD) or NOTCH3. Exome-wide association analysis of 4,882 stroke cases and 6,094 controls recruited in the Pakistan Genomic Resource (PGR) from Pakistan identified one such variant, p.Arg1231Cys, associated with subcortical stroke; p value 2.18e-8, odds ratio (OR) 2.97, 95% confidence interval (CI) 2.03 to 4.35, minor allele frequency (MAF) 7.1e-3. Analyses of the larger PGR cohort comprising of 80,000 participants identified additional heterozygous and homozygous carriers of this variant; call back studies of the carriers and their family members identified a high mortality in family members and a high prevalence of stroke. The Cys allele was found to disrupt a highly conserved domain (91% overall sequence identity between human and mouse), was predicted deleterious by PolyPhen2 (score 0.843 of 1), and was risk-increasing (cases MAF 0.016, controls MAF 0.0053). The p.Arg1231Cys variant was observed at a similar MAF in other South Asian populations sequenced by Regeneron Genetics Center, and present but orders of magnitude rarer in European populations. Despite rare prevalence in Europe, p.Arg1231Cys was associated with ischemic stroke in 450 thousand UK Biobank (UKB) participants; p value 8.8e-4, OR 3.38, CI 1.65 to 6.94, MAF 2.0e-4. In addition, p.Arg1231Cys was associated with multiple brain MRI phenotypes relevant to CADASIL in a 40K subset of UKB, such as mean diffusivity in the external capsule; p value 5.41e-10, OR 1.4, CI 0.96 to 1.8, MAF 2.8e-4. Consistent with CADASIL pathogenicity, a burden test limited to Cys+/- variants in the NOTCH3 ECD (including p.Arg1231Cys) strengthened associations in both Pakistan (subcortical stroke p value 1.5e-10, OR 3.39, CI 2.32 to 4.91) and UKB (ischemic stroke p value 9.3e-8, OR 3.38, CI 1.74 to 2.98). In both cohorts, p.Arg1231Cys was the most common Cys+/- variant in the NOTCH3 ECD. Taken together, these findings have major implications for precision medicine in South Asia, given that an estimated 1 in 92 (over 20 million of 1.9 billion) individuals are carriers for this variant and are at approximately 3-fold elevated risk for stroke. Our estimates suggest that around 2% of strokes in Pakistan may be attributable to NOTCH3 p.Arg1231Cys. 

    Juan L Rodriguez-Flores
    Regeneron Genetics Center

    A high-resolution panel for uncovering repeat expansions that cause ataxias

    The hereditary ataxias are a group of rare neurological diseases with similar symptoms. Many of these ataxic syndromes are caused by expansions of short tandem repeat (STR) in a number of different genes. Molecular genetic testing to accurately determine the genetic cause of known ataxias is often employed to support clinical diagnoses. Advances in therapeutic strategies (e.g., antisense oligonucleotides) to target repeat expansions underscore the importance of understanding the genetic context and sequence complexity of ataxic repeat expansions. Further highlighting the importance of molecular genetic testing, several studies have shown that repeat sequence interruptions in certain ataxia expansions play important roles in modifying the penetrance of the disease and age of onset. PCR and Southern blotting assays are currently the most employed methods in commercially available ataxia repeat expansion panels for clinical testing. Although these electrophoresis-based methods could detect repeat expansions above pathogenic threshold, accurate sizing of the repeat expansion is difficult to achieve when the length of repeat sequence is longer than a few hundred bases. Sequence interruption information is also not available with these approaches. We have recently developed an ataxia expansion panel using the PacBio No-Amp targeted sequencing approach to capture and sequence repeat expansion loci associated with fifteen ataxia diseases. The method utilizes CRISPR-Cas9 nuclease and pairs of guide RNAs to excise DNA fragments containing the repeat sequences within ataxia genes. This approach eliminates PCR amplification artifacts, amplification bias, and preserves native DNA for base modification detection. In this study, we sequenced samples with known or unknown diagnosis for ataxia with the No-Amp targeted sequencing panel utilizing PacBio highly accurate long reads - HiFi reads. The high accuracy of HiFi reads provides both certainty in sizing of the repeat expansion and repeat sequence interruption within the expansion sequences. Sequencing results demonstrate the potential of using this repeat expansion panel for eventual genetic testing. As additional ataxia, and related neurological diseases, caused by STR expansions are discovered and studied, the No-Amp targeted sequencing panel could be expanded to include additional targets. The ability to multiplex samples from different patients also makes the method a potentially cost-effective option for molecular genetic screening in the future. 

    Yu-Chih Tsai
    Pacific Biosciences

    Deployment of clinical whole genome sequencing in support of more than 1,000 resource-limited patients: four years of the iHope Program

    Patients with a suspected genetic disease are often unable to obtain a timely molecular diagnosis, and those in resource-limited locations face even greater challenges. Clinical whole genome sequencing (cWGS) shows promise as a comprehensive test which may shorten the diagnostic odyssey regardless of setting. The iHope Program is a philanthropic effort to provide cWGS to patients who are unable to obtain precision testing due to resource-limitations.
    From June 2016 through June 9, 2021, 1004 individuals pursued cWGS test through the iHope Program. Cases were received from 23 partner iHope clinical sites spanning seven countries. Forty percent of cases (n=403) were received from global partners in Mexico (n=205), Peru (n=93), Italy (n=50), Democratic Republic of Congo (n=40), New Zealand (n=10), and the United Arab Emirates (n=5). Most testing was performed on duos and trios. Proband phenotypes were complex, with nervous system, head and neck, skeletal, eye, and digestive the five most frequently identified Human Phenotype Ontology root ancestor terms.
    Variants were reported in 67% (n=677) of cases, of which 40.5% (n=407) conferred a definitive molecular diagnosis. Reported variants per case ranged from 0 to 5, and in 33 cases (3.3%), multiple molecular diagnoses were observed. Variants spanned 468 unique single genes. Of 1020 reported variants, a majority were nuclear SNVs or MNVs (n=693, 67.9%), followed by CNVs (n=175,17.2%), small indels (n=127, 12.5%), short tandem repeats (n=12, 1.2%), mitochondrial SNVs (n=10, 1%) and uniparental disomy (n=2, 0.1%). Copy number variants ranged in size from 3 kb to full aneuploidies. In fifteen individuals from eleven families, findings were suggestive of a structural chromosomal rearrangement.
    At least ninety days after cWGS report delivery, a clinical utility survey was requested of the ordering clinician to assess effects on care and management. To date, surveys have been obtained for 581 patients (58%), representing one of the largest cWGS clinical utility datasets in a pediatric outpatient population. Data collection is ongoing, but initial analysis indicates that cWGS results prompted follow-up such as imaging, laboratory or physiological testing, referral for specialty consultation or other evaluations in 40% (233/581) of patients. In 56.6% (329/581), cWGS results contributed to counseling about prognosis, recurrence risks, reproductive screening/testing options and screening/testing recommendations or options for family members. These findings suggest that deployment of cWGS in support of resource-limited patients is tractable globally and can have a substantial impact on patient management. 

    Erin Thorpe
    Illumina, Inc.


  • Contains 1 Component(s)

    Speakers discuss leveraging ancestry and family history to study polygenic risk prediction.

    Platform sessions are abstract driven sessions with 6 talks per session. These talks are 10 minutes in length and are cross-topical in nature to represent the broad discipline our field of genetics and genomics represent. After each talk, there will be a 5-minute Q&A with each speaker. For information on each individual session, please view the "Details" tab. 

    Recorded session from the 2021 virtual meeting.

    Local ancestry allows for improved genomic prediction in underrepresented and admixed populations

    Due to the paucity of methodological and computational approaches that account for their genomic complexity, admixed populations are systematically excluded from statistical genomic studies. Admixed populations make up more than a third of the US populace but are severely underrepresented in biomedical research which may contribute to health disparities. To reap the full benefits from the ongoing efforts to collect samples from underrepresented populations and from existing mixed ancestry cohorts, tools facilitating the well-calibrated research of admixed peoples are urgently needed.
    We recently developed a local ancestry aware GWAS model, Tractor, which corrects for fine-scale population structure at the genotype level, often boosts locus discovery power, and produces ancestry-specific effect size estimates and p values. Using Tractor summary statistics from African ancestry (AFR) tracts in ~4500 admixed UK Biobank (UKB) individuals, we built polygenic risk scores (PRS) and predicted blood panel phenotypes on homogenous African ancestry UKB individuals. We benchmarked these PRS against scores created from traditional GWAS runs on 1) the same admixed cohort, 2) a large European UKB sample, and 3) a large multi-ancestry meta-analysis of continental ancestry groups from the pan-UKB project (https://pan.ukbb.broadinstitute.org/). We also tested the accuracy of several PRS models including pruning and thresholding and PRS-CSx. We find that incorporating diverse samples and ancestry-specific estimates from admixed populations results in higher prediction accuracy for homogeneous AFR individuals. The bulk of African-descent GWAS participants are currently admixed individuals of the Americas, and some underrepresented ancestries are rarely found outside of the admixed context. Thus, building models based on ancestry-specific estimates generated from the deconvolved local ancestry tracts of admixed genomes allows for better PRS performance on many diverse populations from making better use of existing collections.
    We additionally highlight several loci which we find to have well-demonstrated effect size differences across ancestries, a phenomenon for which there are few prior examples in the literature. As our models are constructed off of local ancestry components from the same admixed individuals, these results hint at genetic differences rather than environmental factors, which are often tricky to disentangle. Ultimately, our work highlights how Tractor and local ancestry allow for improved population characterization and can be leveraged to advance the understanding of complex diseases across diverse cohorts. 

    Elizabeth G Atkinson
    Baylor College of MedicinePolygenic risk prediction of obesity across the life course and in diverse populations

    Polygenic risk scores (PRSs) for body mass index (BMI) that leverage the increasing genome-wide association study (GWAS) sample sizes may aid risk stratification and allow targeted prevention of obesity at an early age. We constructed ancestry-specific and trans-ancestral PRSs to predict obesity in adulthood, and examined their added value over and above easily measurable predictors of obesity during childhood and adolescence.
    We calculated PRSs based on summary statistics of up to 1.2 million common variants [minor allele frequency (MAF)>1%] from the GIANT consortium’s BMI GWAS meta-analysis of up to 1.6 million individuals (72% European (EUR), 16% East Asian (EAS), 6% African (AA), 4% Hispanic (HA), 2% South Asian (SAS)). Explained variance for BMI and discrimination for obesity were examined in the UK Biobank (UKB, n=437k) and Million Veteran Program (MVP, n=101k). The best performing PRS in EUR was taken forward to the Avon Longitudinal Study of Parents and Children (ALSPAC, n=5.8k), for cross-sectional and longitudinal associations with BMI across 21 time-points from birth to age 22y. We compared the predictive performance of the PRS to that of clinically available factors (maternal education, pre-pregnancy maternal BMI, household social status).
    The trans-ancestral PRS explained more of the variation in BMI than ancestry-specific PRSs in all but the EUR-populations (R2 min-max for non-EUR; UKB: 7.5-12.4% (AA/EAS); MVP: 5.7-11.1% (AA/HA); UKB-EUR (EUR-PRS): 15.8%; MVP-EUR (EUR-PRS): 13.1%). For all ancestries, maximum explained variance was roughly double that of previously published obesity PRSs. The PRSs were better at discriminating between adults with or without obesity than age, sex, or scores of genome-wide significant variants only. EUR-PRS associations with BMI were weak at birth, but increased rapidly during childhood, and remained stable from adolescence onwards (e.g., BMI-SD per PRS-SD at 13y (95%CI): 0.39 (0.37,0.42)). Consistently, longitudinal modeling of BMI trajectories using the PRS showed increasing divergence until early adolescence. When added to other factors available at birth, the PRS helped predict substantially more of BMI from 5y onwards (e.g., R2 at 5y: 13 to 18%; 11y: 11 to 21%).
    The current PRSs, based on larger GWAS sample sizes, double the previously explained variance for BMI across multiple ancestries, thereby advancing the options for prognostication in populations traditionally underrepresented in genetic research. Moreover, we find that genetic predisposition to adult obesity affects childhood growth trajectories, and shows potential to improve risk stratification for obesity at an early age. 

    Roelof A.J. Smit
    The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai

    Improving Polygenic Prediction in Ancestrally Diverse Populations

    Polygenic risk scores (PRS) are less effective when ported across populations. While the scale of non-European genomic resources has been expanded in recent years, a clear attenuation of the predictive performance of PRS remains in individuals who are genetically distant from Europeans.
    In order to include data from all ancestral groups to ensure more equitable delivery of genomic prediction to global populations, we developed the first principled Bayesian PRS construction method, termed PRS-CSx, that jointly models GWAS summary statistics from multiple populations to improve cross-ancestry polygenic prediction. PRS-CSx couples genetic effects across populations via a shared continuous shrinkage prior, enabling more accurate effect size estimation by sharing information between summary statistics and leveraging linkage disequilibrium (LD) diversity across discovery samples, while inheriting computational efficiency and robustness from PRS-CS.
    PRS-CSx outperformed existing PRS methods across various simulations settings with different sample sizes, fractions of causal variants, and genetic correlations between populations. Using quantitative traits from biobanks, we showed that PRS-CSx substantially improved the prediction accuracy even if only a small non-European GWAS was included in the discovery data. For example, the median R2 increased by 76% for individuals of East Asian ancestry when the Biobank Japan samples (N=62K-159K) were added to the UK Biobank European samples (N=340K-360K) to train the PRS. Similarly, the median R2 increased by 22% for individuals of African ancestry when the PAGE study samples (N=20K-50K) were integrated with UK Biobank and Biobank Japan samples (400K-519K).
    Furthermore, by integrating GWAS summary statistics of schizophrenia from East Asian (14K-17K cases due to leave-one-out) and European (33K cases) populations, PRS-CSx more accurately predicted schizophrenia risk in individuals of East Asian ancestry, showing 52% and 97% improvement in the liability R2 relative to PRS constructed using East Asian or European summary statistics only, and approximately doubled the prediction accuracy when compared with alternative methods that can combine multiple GWAS to make prediction.
    Our method represents a much needed and critical breakthrough in PRS construction. Through joint modeling of multi-ancestry data, PRS-CSx substantially improves polygenic prediction in non-European populations. With the rapid expansion of non-European genomic resources, our method will help accelerate the equitable deployment of PRS in clinical settings and maximize its healthcare potential. 

    Yunfeng Ruan
    Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard

    A trans-ancestry polygenic test to predict severe hypercholesterolemia in diverse ancestry patients

    Approximately 7% of adults have severe hypercholesterolemia (SH; untreated low density lipoprotein (LDL-C) ≥ 190 mg/dL). SH is associated with a 6-fold increased risk of cardiovascular disease, and up to 20-fold increased risk in individuals identified with monogenic Familial Hypercholesterolemia (FH)-associated variants. Despite high frequency of cholesterol screening and awareness, individuals with SH remain undertreated, with disparities in treatment and LDL-C control observed among African American (AA) populations. Only 2.5% of individuals with SH harbor a monogenic FH-associated variant, and polygenic SH accounts for 15%-30% of clinical FH, motivating the development of a polygenic test for predicting SH in diverse populations. We obtained summary statistics for validated trans-ancestry polygenic risk scores (PRS) to predict LDL-C from the Global Lipids Genetics Consortium pre-publication. The PRS were developed from a genome-wide association study of ~1.6M trans-ethnic participants, and validated in European (EU), AA, African, Hispanic or Latino (HL), South Asian and East Asian populations. We leveraged independent genotype and phenotype data from the diverse BioMe biobank in New York City. We extracted laboratory values and medications from electronic health records for adults with an age range of 18-95 from three population groups: AA, EU, and HL (other groups were excluded due to low sample size). SH cases were defined as participants with statin-adjusted maximum LDL-C ≥ 190 mg/dL and controls with statin-adjusted maximum LDL-C < 160 mg/dL (EU: 323/4810, AA: 422/3741, and HL: 539/5780). In a model that included the covariates age, sex, and the top 10 principal components, we measured PRS discrimination (via AUC) which was 0.68 (0.65-0.71), 0.70 (0.68-0.73), and 0.72 (0.70-0.74) for EU, AA, and HL respectively; and 0.67 (0.64-0.70), 0.68 (0.66-0.71), and 0.65 (0.62-0.67) for the genomic predictor alone. The effect size of the PRS was 1.97 (1.74-2.24), 2.0 (1.85-2.35), and 2.02 (1.81-2.25) odds ratio (OR) per standard deviation. We established a high-risk threshold of 3%, and found effect sizes of 4.98 (3.30-7.34), 2.99 (1.89-4.61), and 3.96 (2.74-5.64) OR compared to the 97% below the threshold. We estimated prevalence-adjusted positive vs. negative predictive values of cases being classified in the high risk group, and demonstrated 0.25 vs 0.93, 0.17 vs 0.93, and 0.21 vs 0.93. In summary, we demonstrate that a PRS for LDL-C can be leveraged to predict a 3- to 5-fold increased risk of SH in diverse populations, raising the possibility that this test could be used to identify individuals predisposed to polygenic SH. 

    Michael C Turchin
    The Institute for Genomic Health, Icahn School of Medicine at Mount Sinai

    Phenome-wide association study of polygenic risk for asthma in the UK Biobank highlights traits with shared genetic architecture and sex specific effects

    Polygenic risk scores (PRSs) aggregate additive effects of genetic variants to estimate individual risks for heritable diseases and can be used clinically to inform decisions on screening, therapeutic intervention, and lifestyle modification. The aim of this study was to develop a PRS for asthma using genetic information from a large, multiethnic (ME) cohort and investigate its association with 267 phenotypes in the UK Biobank (UKB). Two asthma PRS models were developed based on European (EU) (19,954 cases, 107,715 controls) and ME (23,948 cases, 118,538 controls) summary statistics from the Trans-National Asthma Genetic Consortium meta-analysis. Posterior SNP effect size estimates were generated using a Bayesian regression framework, implemented in PRS-CS. To evaluate PRS prediction for asthma, each model was applied to white British (36,065 cases, 314,781 controls) and ME (43,109 cases, 377,061 controls) subjects from UKB using logistic regressions adjusting for sex and ancestry. The EU PRS applied to the white British cohort had the strongest association with doctor-diagnosed asthma (p=1.96x10-295, OR=1.34, 95% CI=1.32-1.35, AUC=0.582) and was most strongly associated with childhood onset asthma (COA; onset before age 12; p=1.77x10-181, OR=1.59, 95% CI=1.54-1.64, AUC=0.624). There were significant sex-by-PRS interaction effects for COA (p=0.049) and adult onset asthma (AOA; onset after age 25; p=0.048). Given the same PRS, males had a higher risk than females for COA but females had a higher risk than males for AOA. The phenome-wide association study identified significant associations between the PRS and 27 binary and 69 quantitative traits (Bonferroni p<1.87x10-4). The most significant association was with percent eosinophils (p=9.33x10-298, β=0.11), a known asthma-associated trait. Other associated traits included asthma age of onset (p=4.12x10-94) and measures of lung function (FEV1) (p=1.91x10-117). Some associations were less expected. For example, age at first live birth was negatively correlated with the PRS (p=8.57x10-15, β=-0.095) and HbA1c was positively correlated with the PRS (p=4.83x10-33, β=0.13). Sex-specific effects were observed for 5 binary and 15 quantitative traits, such as fat-free mass (p=1.71x10-6, β=0.028 in females; p=0.42, β=7.6x10-3 in males). Overall, our results suggest shared genetic architectures between asthma and a broad swath of pulmonary, cardiometabolic, anthropometric, and reproductive traits, many of which had not previously been linked to asthma and some with sex-specific effects. This research was conducted using the UK Biobank Resource under application number 44300. 

    Yu Lin Lee
    Biological Sciences Collegiate Division, Univ Chicago

    Modelling hidden genetic risk from family history for improved polygenic risk prediction

    With many polygenic risk scores demonstrating research and clinical utility, it is worth questioning whether family history, a traditional genetic predictor, still provides valuable information.
    Family history of complex traits may be influenced by transmitted rare pathogenic variants, intra-familial shared exposures to environmental factors, as well as a common genetic predisposition. Therefore, we propose and develop a latent factor model to quantify disease risk in excess of that captured by a common SNP-based polygenic risk score, but inferable from family history. This model enables calibration of polygenic risk scores with respect to family history without fitting regression models.
    We applied our model to predict adult height for 941 children in the Avon Longitudinal Study of Parents and Children. Our predictor was able to explain ~55% of the total variance in adult height, close to the estimated heritability of height and substantially higher than ~40% captured by a polygenic risk score for height or mid-parental height alone. For nine complex diseases, including metabolic syndromes, cardiovascular diseases, neurological disorders and several types of cancer, we used our model to improve polygenic risk prediction for >400,000 White British participants in the UK Biobank. For all nine complex diseases investigated in the UK Biobank, parental disease history brought significant improvements in the discriminative power of polygenic risk prediction. For instance, combined with age and sex, our predictor achieved an area under the receiver operating characteristic curve (AUROC) of 0.734 and an area under the precision-recall curve (AUPRC) of 0.171 in identifying individuals with type 2 diabetes, exhibiting significantly stronger discriminative power than the polygenic risk score (AUROC = 0.712; AUPRC = 0.148) or the parental disease history (AUROC = 0.707; AUPRC = 0.148) alone. Comparing to using a type 2 diabetes polygenic risk score, our predictor had a net reclassification index of 3.72% in identifying 20% of the population at an elevated risk.
    Taken together, our work showcases an innovative paradigm for risk calculation, and supports the utility of incorporating family history into polygenic risk score-based genetic risk prediction models. 

    Tianyuan Lu
    McGill University


  • Contains 1 Component(s)

    Speakers discuss global perspectives and initiatives for large-scale genomics.

    Platform sessions are abstract driven sessions with 6 talks per session. These talks are 10 minutes in length and are cross-topical in nature to represent the broad discipline our field of genetics and genomics represent. After each talk, there will be a 5-minute Q&A with each speaker. For information on each individual session, please view the "Details" tab. 

    Recorded session from the 2021 virtual meeting.

    Trans-ancestry imputation and exome sequencing of more than 1 million individuals identifies genetic variation protecting against SARS-CoV-2 infection and predicts individuals at risk for severe COVID-19 outcomes

    COVID-19 symptoms vary widely, ranging from asymptomatic in some patients to fatal in others. Elucidating the host genetics of COVID-19 holds the potential for understanding both susceptibility to SARS-CoV-2 infection as well as heterogeneity in patient presentation and outcome. Prior work focused on identifying common variants associated with COVID-19 susceptibility and severity, but little has been done to explore the entire allele frequency spectrum of genetic variation, from common to rare exonic variants. Here, we present the largest trans-ancestry exome sequencing study of COVID-19 to date in 586,713 individuals, with a larger set of 1,012,636 individuals with imputed data across 7 studies and 5 continental ancestries.
    Through exome sequencing of 21,820 COVID-19 cases and 564,893 controls, we did not identify any rare variants after Bonferroni correction (P<9.6e-10). Burden tests identified three genes tentatively associated with COVID-19: DISP3 (P=2e-8; OR=1.8±0.3), MARK1 (P=3e-9; OR=38.4±16.9), and TLR7 (P=4e-8; OR=4.5±2.2). Despite having a 100x larger sample size, we could not replicate a previous reported role for rare variants in the interferon pathway (P=0.59).
    Our larger GWAS of 56,841 cases and 955,795 controls found 11 loci (P<5e-8). Most notably, we identified a strong protective association amongst SARS-CoV-2 infected cases for rs190509934 located 60bp upstream of ACE2, the primary cell receptor for the SARS-CoV-2 spike protein (P=4.5e-13; OR=0.6±0.08; EUR MAF=0.003). Using RNA-seq, rs190509934 reduced ACE2 expression by 39% (P=3e-8), supporting the hypothesis that reduced ACE2 expression protects against SARS-CoV-2 infection.
    Lastly, we developed a polygenic risk score (PRS) to predict hospitalization and severity of COVID-19. Among those of European ancestry, individuals with the top 10% PRSs are 1.8-fold more likely to be hospitalized (P=6e-11) and 1.58-fold more likely to be placed on a ventilator or die from COVID-19 (P=7e-10). These associations hold in other non-European populations (albeit with decreased power) and after accounting for known clinical risk factors.
    Our data represents the most comprehensive survey of common and rare exonic variation associated with COVID-19 identifying new loci and polygenic risk scores that predict severity of COVID-19. 

    Jack Kosmicki
    Regeneron Genetics Center

    Rare variant analyses in 239,395 whole exome and whole genome sequenced participants of the UK Biobank reveals novel genetic associations with renal function and chronic kidney disease

    Genome-wide association studies have identified common genetic variants associated with chronic kidney disease (CKD), but the burden of rare loss-of-function (LoF) or pathogenic/likely pathogenic (P/LP) variants has not been well characterized. We performed gene-/region-based and variant association analyses for 5 renal function biomarkers (eGFR estimated from serum creatinine and/or cystatin-C, BUN, UACR) and 5 CKD endpoints (ESRD and stage4/5 CKD, CKD defined by biomarkers and/or diagnoses from NHS data, Cystic) in 239,395 UKB participants of genetically-assessed European ancestry and with whole exome (WES, n=171,172) or whole genome sequencing (WGS, n=121,019). For each trait, we fit a genome-wide regression model and tested for association using REGENIE V2.0, adjusting for age, sex, 10 principal components of ancestry, assessment center and BMI, where appropriate. For gene-based analyses, we generated 15 models to collapse ClinVar-classified P/LP, VEP(LOFTEE)-predicted putative LoF and deleterious variants predicted by 16 in silico scores (SIFT, Polyphen, BayesDel, etc.) from dbNSFP 4.1c. The WGS data further enabled annotation of promoter/enhancer variants, which were incorporated into collapsing models for gene-based association. In participants with WES, we identified 30 and 11 genes associated with ≥2 biomarkers and ≥1 CKD endpoint across collapsing models (FDR<0.05), respectively. PKD1/2, COL4A3/4, CUBN, IFT140 were associated with both biomarkers and CKD. Association analyses also highlighted other genes including: COL4A1, CST3, LAMC1, LRP2, SLC22A2, SLC34A3, SH2B3. Variant-level analyses further informed impact on protein, e.g. the SLC22A2 association signal was mainly driven by a frameshift (rs8177505) with lowering effects on eGFR (p=1.2e-27, beta=-6.2, MAF=0.12%). Exome-wide variant analyses revealed 25 genes (eg. PDILT-UMOD) with variant associations (p<5.0e-8) with >3 biomarkers or ≥1 endpoint, including 2 that were also implicated from the gene-based analyses (COL4A4 and CUBN). Analyses of WGS allowed for sequence level validation of exome derived findings and the identification of additional variants not captured in WES. This study provides a framework for the assessment of the genetic landscape of kidney disease. The results validated known genes and identified potential novel associations with renal function. 

    Shuwei Li
    Janssen

    Novel genetic associations for rare diseases with GWAS and trans-ethnic analysis of self-reported medical data

    Nearly 7000 rare diseases are known, and though each disease affects a few people, the total population prevalence of rare diseases is estimated to be 3.5-5.9%. A key challenge in the study of rare disease genetics is assembling large case cohorts for well-powered studies. Here we demonstrate use of large-scale self-reported rare disease data, combined with genetic data collected through the 23andMe direct-to-consumer platform, to study 33 rare diseases and identify genetic associations through GWAS. We developed web-based questionnaires, and gathered self-reported data on rare diseases from a cohort of over 1.6 million genotyped research-consented individuals. To reduce mis-reporting and maximize coverage, we used an autocomplete mechanism including 7000 rare diseases. We validated the approach through simulations and replication of known rare disease associations. In simulations based on genotypes from 4,957,230 European individuals, we show that GWAS can recover genome-wide significant associations in monogenic rare diseases for a variety of architectures. In rare diseases with known genetic associations, we reidentified 29 associations at a genome-wide significance level (p-value < 5e-8) with a diverse range of minor allele frequencies (minimum MAF=0.0001, maximum MAF=0.487) and effect sizes for the risk allele (minimum OR=1.24, maximum OR=273.15). We performed the first GWAS in European ancestry for Duane retraction syndrome, vestibular schwannoma and spontaneous pneumothorax, and report novel genome-wide significant associations for these diseases. For Duane retraction syndrome, an eye movement disorder, we found two independent associations near the OLIG1 and OLIG2 genes, knockdown of which causes a similar phenotype in mice. For vestibular schwannoma, we find a single association near the CDKN2A and CDKN2B genes, which are associated with many other cancers. We found three novel associations for spontaneous pneumothorax, two of which are also associated with lung function phenotypes. We replicated these associations in the UK Biobank and found that 3 of 5 replicated with p < 0.05, and all 5 had the same direction of effect. Trans-ethnic mixed-model analyses, including individuals of all ancestries, found the same associations with comparable or increased significance. Our results show that self-reported rare disease data is a viable method for discovering genetic associations for rare diseases. With increasing sample size and diverse imputation reference panels, we may also be able to study rare diseases more widely in multiple populations and improve our understanding of the trans-ethnic genetic architecture of these diseases. 

    Suyash S Shringarpure
    23andMe

    Common and rare variant analysis of 21K psoriasis cases and 623K controls identifies novel, protective associations in several genes in the type 1 interferon pathway

    Psoriasis is a complex autoimmune disease resulting in chronic inflammation and hyperproliferation of the skin. The aberrant immune response associated with psoriasis is mediated by pathogenic T cells, which are activated, in part, by type 1 interferons (IFNs). Prior large-scale analyses of psoriasis cases focusing on common genetic variants have implicated >63 loci, including genes in the IFN signaling pathway. However, large-scale analysis of rare exonic variation is lacking.
    To study the contribution of both common and rare variants to psoriasis risk, we performed whole-exome sequencing and meta-analysis of 20,810 psoriasis cases and 623,159 controls of EUR and AFR ancestry across 6 cohorts. Common variant analysis replicated 44 significant and independent associations in known psoriasis loci, including IL23RTYK2IL12BHLA-C, and DDX58, among others. Rare-variant gene-burden analysis of putative loss-of-function (pLoF) and/or predicted-deleterious missense variants (<1% AAF) identified significant and novel associations in 5 genes, including 3 genes in the IFN pathway. These include protective pLoF associations for IFIH1 (OR=0.74 [0.68, 0.81], p=4.1E-12), which encodes a pathogen sensor that activates IFN production, and TRIM65 (OR=0.63 [0.50, 0.79], p=4.8E-5), which encodes a ubiquitin ligase that binds and activates IFIH1. We find the protective TRIM65 association is driven by a rare, predicted-deleterious missense variant (rs202175254, AAF=0.1%) in the IFIH1-TRIM65 binding domain. Further, we find a nominally significant, protective association for the burden of rare pLoFs in DDX58 (OR=0.76 [0.49, 0.89], p=6.7E-3), which encodes a second pathogen sensor that activates IFNs. This DDX58 protective pLoF association helps confirm direction of effect at this known psoriasis locus.
    Consistent with inhibition of IFNs being protective in psoriasis, we also found a significant and novel gene-burden association between increased odds of psoriasis and pLoFs in ADAR (OR=2.29 [1.68, 3.12], p=1.4E-7), which encodes a protein that suppresses IFNs and in which partial LoFs have been associated with Aicardi-Goutières syndrome, an inherited disorder that features over-production of IFNs.
    Collectively, these results represent the largest rare-variant exome-sequencing analysis of psoriasis, to date. Future experiments will characterize effects of these pLoFs on protein expression and/or function, and further analysis will determine whether an IFN gene signature can identify a clinically-relevant subset of psoriasis patients who would therapeutically benefit from IFN inhibition. 

    Julie Horowitz
    Regeneron Genetics Center

    Investigating genetic and phenotypic associations for 168 blood metabolites in 120K UK Biobank participants

    In this study, we accessed the large-scale metabolomics, exome sequencing and phenomics data from the UK Biobank (UKB) to investigate gene-metabolite and metabolite-phenotype relationships. Blood metabolites (N=168) were profiled by Nightingale Health in ~120,000 UKB participants, >90% of whom had exome sequences and all had data on ~16,000 clinical traits.
    We explored genetic associations with blood metabolites by two complementary approaches: (i) single-variant analysis, and (ii) gene-level collapsing analysis, using a linear regression model, adjusted for age, sex and BMI. For the single-variant analysis, we tested ~3.2 million variants under dominant and recessive models. For the gene-level collapsing analysis, the aggregate effect of variants in each gene was tested using 11 different models, including ones that focused on rare (MAF<0.1%) missense and protein-truncating variants. We also performed a metabolite PheWAS, in which the association for each metabolite was tested with each clinical trait.
    Our analyses provide a rich catalogue of significant (p<1x10-8) associations: 10,461 variant-metabolite, 970 gene-metabolite, and 127,947 metabolite-phenotype relationships. This includes well-established, biologically plausible associations such as variants in PAH with phenylalanine levels [beta=1.2; p<1x10-300] and the concentration of intermediate-density lipoprotein particles with type 2 diabetes [beta=-1.5; p<1x10-300]. These data may also provide insights into underlying biological mechanisms: for instance, the observed metabolite signature for mutations in a gene that is a known drug target (e.g, HSD17B13) can indicate the metabolic profile expected with desirable therapeutic response.
    The catalogue of genetic and phenotypic relationships for blood metabolites, which will expand further once metabolomics data becomes available in the entire UKB cohort of ~500,000 subjects, represents an excellent resource to better understand mechanisms underlying complex human diseases. 

    Abhishek Nag
    Centre for Genomics Research, AstraZeneca

    Practical implementation of polygenic risk scores and absolute risk score estimation across diverse ancestry groups

    Polygenic risk scores (PRS) have generated considerable translational interest. Yet, most validation efforts focus on assessing relative rather than absolute risk scores (ARS), even though ARS are required for clinical decision making. ARS validation experiments are typically based on a single large cohort split into training/testing and rarely incorporate PRS. While such approaches typically generate calibrated ARS within the testing dataset, they do not properly capture the complex biases inherent to each healthcare context or account for environmental differences between countries and ethnicities. Consequently, the robustness of the ARS across different contexts is largely unknown.
    To address these gaps, we derived a framework to combine ethnicity-specific disease baselines from a range of country-specific surveys, which capture social determinants of health, with ancestry-adjusted PRS (European OR per 1SD 1.87, 2.10, 1.51 and 2.09 respectively) for breast cancer, prostate cancer (PC), cardiovascular disease (CVD) and type 2 diabetes (T2D). We validated these ARS in independent datasets, computing calibration summary statistics, including the standard incidence ratio (SIR), calibration slope and intercept, and the integrated calibration index.
    We find that inclusion of an ethnic specific baseline captures substantial ARS variability not captured by the PRS, particularly for PC, where an UK African and Caribbean baseline results in calibration (0.99-1.34 95% CI SIR) whilst the UK average baseline results in strong miscalibration (2.24-3.02 95% CI SIR). The extent of the calibration varied, with challenges arising for T2D and CVD, whose incidence has fluctuated across time and location in the US over the last decades. For T2D, baselines date from 1997-2019 but prospective testing data date from 1987-1999, resulting in miscalibration for White males (1.35-1.62 95% CI SIR). For CVD, baselines for myocardial infarction and fatal heart disease date from 2004-2011 and ischemic stroke from 1999, but prospective testing data date from 1986-2000, resulting in miscalibration for White females and males (0.66-0.92 and 1.04-1.31 95% CI SIR respectively).
    We demonstrate that with appropriate data it is possible to translate genetic risk into clinically meaningful ARS that robustly replicate in diverse contexts. Our results also demonstrate the challenges arising from variation across ethnicity, geography and time and the need for population-relevant information on which risk prediction tools are to be applied. 

    Rachel Moore
    Genomics plc

    The Kidney genome atlas reveals a novel locus on chromosome 14 associated with adult proteinuric kidney diseases

    Chronic Kidney disease (CKD) affects 1 in 9 people worldwide. There is a high unmet need for drugs that extend and restore kidney function, because dialysis and organ transplantation carry substantial economic and psychological burden. To foster drug development of genetically validated targets, we have created the Kidney Genome Atlas (KGA) by assembling ~23,000 whole genomes from 2,832 kidney disease cases including proteinuric kidney disease cases such as Focal segmental glomerulosclerosis (571 cases), minimal change disease (244 cases), nephrotic syndrome (196 cases) and idiopathic proteinuria (1,123 cases) and 19,804 controls. Following the gnomAD pipeline, we implemented a rigorous quality control procedure to obtain a high confidence dataset for downstream analyses of proteinuric kidney diseases. Ancestries were inferred genetically based on a k-NN model trained on 1,000 Genomes data which resulted in 597 cases and 10,127 controls of European (EUR) ancestry, 513 cases and 3,805 controls of African (AFR) ancestry, and 290 cases and 754 controls of Latino/Admixed American (AMR) ancestry for association testing. Meta-analysis of common variants across ancestries showed minimal impact of potential confounders, such as ancestry or sequencing center differences (lambda=1.03). We identified a novel locus on chromosome 14 (rs11160484; effect size = -0.42, P = 2.8*10-8) associated with proteinuric kidney disease. In addition, we confirmed the well-known association of APOL1 risk haplotypes (G1/G1, G2/G2 or G1/G2; effect size = 0.50, P = 2.4*10-9, under recessive model) in the AFR cohort. LD-score regression analysis revealed a trend towards a weak positive genetic correlation (rg = 0.097, 90% CI [0.010, 0.18]) between proteinuric kidney diseases and CKD defined by estimated glomerular filtration rate or eGFR (Wuttke et al, 2019). Using summary statistics from our EUR dataset, we estimated the SNP heritability of proteinuric kidney diseases at 0.15 (95% CI [0.095, 0.20]), suggesting that there may be many more genetic contributions that are yet to be discovered. These findings advance our understanding of the genetic architecture of proteinuric kidney diseases and highlight an opportunity for novel therapies and patient stratification. 

    Eva Fast
    Goldfinch Bio


  • Contains 1 Component(s)

    Speakers discuss insights from large-scale studies of diverse phenotypes and populations

    Platform sessions are abstract-driven sessions with 6 talks per session. These talks are 10 minutes in length and are cross-topical in nature to represent the broad discipline our field of genetics and genomics represent. After each talk, there will be a 5-minute Q&A with each speaker. For more information, see the Details tab.

    Recorded session from the 2021 virtual meeting.

    A global biobank study of asthma identifies novel associations, illuminates shared genetic architecture, and improves polygenic prediction across diverse ancestry groups

    Asthma is a complex and multifactorial disease that affects millions of people worldwide and varies in prevalence by an order of magnitude across geographic regions and diverse populations. However, the extent to which genetic variation contributes to these disparities is unclear, as studies probing the genetics of asthma have been primarily limited to populations of European descent. To expand our understanding of the genetic factors underlying asthma risk in different ancestral populations, we conducted the largest genome-wide association study of asthma to date (N cases=153,763 and N controls=1,647,022) via meta-analysis across 18 biobanks with harmonized phenotype definitions and spanning multiple countries and genetic ancestries, collectively called the Global Biobank Meta-analysis Initiative (GBMI). This meta-analysis discovered 180 independent genome-wide significant loci (p < 5e-8) associated with asthma, 69 of which are novel. We replicate well-known associations such as TNFRSF8 and IL1RL1, and find that the novel associations tend to have smaller effects than previously-discovered loci, highlighting our substantial increase in effective sample size and statistical power. Despite the considerable range in prevalence among biobanks, from 3% to 24%, the genetic effects of associated loci are largely consistent across biobanks, ancestries, biobank ascertainment-types, and asthma definitions. This offers insight into the potential shared biological pathways that may be differentially affected by environmental factors and contribute to variation in prevalence. To further probe the polygenic architecture of asthma, we are constructing polygenic risk scores (PRS) using multi-ancestry approaches to establish a baseline understanding of PRS performance for asthma in different populations. The vast increase in the scale and diversity of GBMI yields higher predictive power for asthma across the board; for example, with the multi-ancestry GBMI cohort, we achieve .03 phenotypic variance explained in East Asian populations compared to the highest previously reported variance explained in an East Asian population, .0075. The availability of additional phenotypic information on asthma subtypes and asthma-related diseases like COPD in GBMI-participating biobanks will allow us to further tease apart the genetics underlying various aspects of the disease. In summary, we have identified novel loci associated with asthma, found remarkable consistency of genetic effects despite enormous heterogeneity in prevalence, and have quantified the relative contribution of polygenic components to asthma risk.

    Kristin Tsuo
    Department of Genetics, Harvard Medical School


    Genome-wide polygenic risk score of prostate cancer in African and European ancestry men
    Genome-wide polygenic risk scores (PRS) are reported to have higher performance than standard genome-wide significant PRS across numerous traits. We evaluated the ability of genome-wide PRS to evaluate prostate cancer risk compared to our recently developed and highly predictive multi-ancestry PRS of 269 established prostate cancer risk variants. Genome-wide PRS approaches included LDpred2, PRS-CSx, and EB-PRS. Models were trained using the largest and most diverse prostate cancer GWAS to date of 107,247 cases and 127,006 controls, which was previously used to develop the multi-ancestry PRS of 269 variants. Resulting models were tested in independent samples of 1,586 cases and 1,047 controls of African ancestry from the California Uganda Study and 8,045 cases and 191,835 controls of European ancestry from the UK Biobank. Among the genome-wide PRS approaches, LDpred2 had the best performance, with AUCs of 0.649 (95% CI=0.627-0.670) in African and 0.819 (95% CI=0.815-0.823) in European ancestry men. African and European ancestry men in the top PRS decile relative to men in the median 40-60% PRS category had odds of prostate cancer of 3.29 (95% CI=2.47-4.40) and 2.99 (95% CI=2.78-3.23), respectively. However, the PRS constructed using 269 variants had significantly larger AUCs in both African (0.679, 95% CI=0.659-0.700) and European ancestry men (0.845, 95% CI=0.841-0.849), with African and European ancestry men in the top PRS decile having larger odds of prostate cancer (3.53, 95% CI=2.66-4.69 and 4.20, 95% CI=3.89-4.53, respectively). We are currently further validating these findings in diverse men from Million Veteran’s Program. This investigation suggests that genome-wide PRS may not improve the ability to distinguish prostate cancer compared to a genome-wide significant PRS.

    Burcu F Darst
    University of Southern California

    Genetic association of phenotypes derived by self-supervised deep learning of retina fundus images reveals new genes for eye development
    Although genome-wide association studies (GWAS) have achieved great success and identified thousands of genetic associations, phenotypes of most existing GWAS studies are predefined. While these phenotypes encode valuable biomedical knowledge, they are also biased by current clinical practice and epidemiological studies. Also as phenotype code is greatly simplified, it is often not sufficient to capture the complexity of human physiology and pathology in their entirety. Fortunately, with the medical record becoming increasingly digitized, there are new opportunities to derive phenotypes beyond expert-curation, which would avoid human bias and discover new phenotypes that are previously missed. Here, leveraging breakthroughs in self-supervised deep representation learning, we propose a new approach for phenotype discovery from medical images. We use a contrastive loss function over an Inception V3 architecture to learn a representation that captures the inherent image features of individuals. Using vessel segmentation masks generated from retina fundus images as inputs, we designed a phenotyper neural network model that generates 128 phenotypes representing retinal vasculature. After training on 40,000 images from EyePACS, our model generated phenotypes from 130,967 images of 65,629 British White participants in the UK Biobank. A GWAS of these vasculature phenotypes identified 34 independent loci, at least 5 are associated with vessel features. Mouse knockout experiments verified the role of the WNT7B gene, a newly found locus, in retinal vessel development. Our results establish a new framework of unsupervised image-based genome-wide genotype phenotype association studies (iGWAS). Our framework would expand the repertoire of GWAS phenotypes and enable discovery of new biology.

    Ziqian Xie
    Baylor College of Medicine

    Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases
    Genome-wide association studies (GWAS) of human complex traits or diseases often implicate genetic loci that span hundreds of significant genetic variants. However, these loci may only contain one or a handful of causal variants. Statistical fine-mapping refines a GWAS locus to a smaller set of likely causal variants (credible set). Since non-causal variants have marginally different effects across populations where LD differs, capitalizing on the genomic diversity across ancestries holds the promise to further improve the resolution of fine-mapping. However, to date, cross-population fine-mapping efforts have been limited, partly due to the lack of statistical methods that can appropriately integrate data from multiple ancestries. Building on Sum of Single Effects (SuSiE), a single-population fine-mapping model, we have developed SuSiEx, an accurate and computationally efficient method for trans-ancestry fine-mapping. SuSiEx assumes that causal variants are largely shared across populations while allowing for varying variant effect sizes across populations. Our model can integrate data from an arbitrary number of ancestries, explicitly models population-specific LD patterns, accounts for multiple causal variants in a genomic region, and can be applied to GWAS summary statistics without access to individual-level data. We showed, via simulation studies, that compared with fine-mapping 100K European samples, integrating 50K European and 50K African samples using SuSiEx enabled fine-mapping of more association signals, and dramatically increased the resolution of credible sets. Comparing with PAINTOR, SuSiEx had a 37% reduction in the median size of credible sets and a 54% increase in the number of high Posterior Inclusion Probability (PIP) variants. We applied SuSiEx to 25 quantitative traits that are available from both the Taiwan Biobank (TWB, n = 92,615) and UK Biobank (UKBB, n = 361,194) to fine-map genetic loci reaching genome-wide significance. Compared with single-population fine-mapping in UKBB, cross-ancestry fine-mapping significantly reduced the size of credible sets and increased the PIP of the most probable variant. We additionally applied our method to schizophrenia GWAS summary statistics of East Asian and European ancestries. Compared with the published fine-mapping results from PGC using FINEMAP on the same data, SuSiEx reduced the size of credible sets in 70% of the fine-mapped loci. Manual inspection confirmed that SuSiEx provided more sensible results in many loci. As the accumulation of GWAS results from different ancestries, the application of our method will be much promising.

    Kai Yuan
    Analytic and Translational Genetics Unit, Massachusetts General Hospital

    Analysis across Taiwan Biobank, Biobank Japan, and UK Biobank identifies hundreds of novel loci for 36 quantitative traits
    Genome-wide association studies (GWAS) have identified tens of thousands of genetic loci associated with human complex traits and diseases. However, the majority of GWAS were conducted in individuals of European (EUR) ancestry. Failure to capture global genetic diversity has limited biological discovery and impeded equitable delivery of genomic knowledge to diverse populations. Here we performed genome-wide analysis on 102,900 individuals across 36 human quantitative traits in the Taiwan Biobank (TWB), a major biobank effort that broadens the population diversity of genetic studies in East Asia (EAS). We identified 1,907 independent genome-wide significant loci (P-value < 5x10-8) across the 36 traits, among which 1,287 loci survived Bonferroni correction for the number of traits tested (P-value < 5x10-8/36). The number of genome-wide significant loci per trait ranged from 1 for forced expiratory volume in one second (FEV1) and FEV1 to forced vital capacity (FVC) ratio (FEV1R), to 211 for height (HT). We estimated the SNP-based heritability (h2g) for each trait, which ranged from 0.009 (FEV1R) to 0.384 (HT), and pairwise genetic correlations between these traits, which identified clusters of highly genetically correlated traits. Of the 1,907 genome-wide significant loci, 1,615 were fine-mapped to a total of 1,972 credible sets, each representing an independent association signal. Out of the 1,972 credible sets, 232 were mapped to a single variant with posterior inclusion probability (PIP) > 95%, among which 24 were missense variants. Leveraging GWAS summary statistics from Biobank Japan (BBJ) and UK Biobank (UKBB), we found that the genetic architecture of the quantitative traits examined was largely consistent within EAS and between EAS and EUR populations. Integrating TWB and BBJ GWAS identified a total of 2,975 genetic loci, among which 979 had not been reported in previous biobank studies. We also examined whether polygenic risk scores (PRS) of biomarkers can be used to predict the risk of common complex disease, and demonstrated the potential utility of biomarker GWAS in predicting disease risk (e.g., type 2 diabetes) and the promise of multi-trait cross-population polygenic prediction. Our novel findings represent a major advance in diversifying GWAS samples and the characterization of the genetic architecture of human complex traits in EAS populations. Future endeavors on increasing the sample size and phenotype coverage in TWB, and improving cross-biobank data harmonization will further facilitate genomic discovery.

    Yen-Feng Lin
    National Health Research Institutes

    The effects of demographic-based selection bias on GWAS results in the UK Biobank
    Genome-wide association studies (GWASs) are almost always based on a non-random sample of the underlying population, as obtaining very large sample sizes, rather than ensuring such samples are representative, has been key to their success. Selection bias in estimated genetic associations, including how it varies across traits, is poorly understood. A sample of particular interest is the widely used UK Biobank (UKB). Because of the need for very large samples, the UKB is included in almost all large GWASs as one of the largest cohorts. In addition, UKB's subsample of genotyped siblings (UKBSIB) has become a crucial resource for estimating genetic effects free of environmental confounding. Using nationally representative UK Census microdata as a reference, we document substantial non-random selection into the UKB, and even stronger for UKBSIB: individuals in the UKB and UKBSIB are more likely to be female, higher educated, and older, compared to the underlying population that received an invitation. We also show that this non-random selection leads to significant selection bias in associations between various demographic and health-related traits estimated in the UKB. We then estimate probabilities of UKB participation for each UKB participant to estimate selection-corrected GWASs for multiple traits using inverse probability weighting. Based on preliminary analyses for the top 5,000 SNPs associated with BMI, education, and height, respectively, we show that the extent to which selection-corrected GWAS results differ from those of regular GWASs is trait-specific. Genetic associations for educational attainment and BMI are the most altered after correcting for volunteer bias, whereas associations for height remain relatively unaffected. For educational attainment, 12.6% of our estimated SNP effects flip sign after correcting for selection bias, suggesting that current GWAS methods are not sufficiently robust. We will extend these analyses by investigating more phenotypes, conducting regular and inverse probability weighted GWASs in the UKB that incorporate all available SNPs, and comparing results. Our findings will be useful for understanding the extent to which a particular phenotype is prone to selection bias in GWAS, and our correction method provides an alternative when population-representative cohorts are not available.

    Sjoerd van Alten
    Vrije Universiteit Amsterdam