Workshop: Reproducible Analysis of Human Pangenome Data using the AnVIL
-
Register
- Regular Member - Free!
- Early Career Member - Free!
- Resident/Clinical Fellow Member - Free!
- Postdoctoral Fellow Member - Free!
- Graduate Student Member - Free!
- Undergraduate Student Member - Free!
- Emeritus Member - Free!
- Life Member - Free!
- Nonmember - $55
- Trainee Member - Free!
Cloud-based analysis of genomic datasets is increasingly vital for portability, reproducibility, and multi-institution collaboration, but transitioning to the cloud can be daunting. We will offer a workshop that will serve to eliminate some of the barriers to the adoption of these tools. Specifically, we will teach researchers how to access and utilize The Analysis, Visualization, and Informatic Lab-space (AnVIL), an environment that provides access to hosted data, reproducible tools, and collaborative workspaces, and comprehensive documentation to enable users to conduct research in the cloud. This workshop will demonstrate how to access and explore data in AnVIL. Participants will also learn to search for analysis tools in Dockstore, a platform for sharing portable, container-based tools and workflows written to be interoperable across local and cloud environments. Finally, they will analyze data in a Terra workspace, which is a dedicated space where researchers can access and organize the same data and tools and run analyses.
This workshop will specifically explore and demonstrate open-access data from the Human Pangenome Reference Consortium (HPRC), an NHGRI funded effort to create a more diverse and comprehensive reference human pangenome. We will present the data and methods produced and utilized within the first year of this project, which ultimately aims to release the assembly of high-quality diploid genomes from >350 ethnically diverse individuals across five years. Currently, raw data and assemblies from 45 individuals and associated Docker-based analysis workflows written in the Workflow Description Language (WDL) are available in the AnVIL for researchers to explore and utilize. Data and workflows will continue to be publicly released as early as possible to promote open science. These data make an excellent substrate for interaction with these data types and new workspaces and methods.
Using data and workflows from the HPRC, participants of this workshop will follow along with instructors to learn how to:
- Register for a Terra account and set up a project using $300 in free Google Cloud credits
- Set up a collaborative cloud workspace in Terra
- Access and explore Human Pangenome Data hosted by AnVIL
- Search for bioinformatics workflows in Dockstore and export them to a Terra workspace
- Configure and launch a Docker-based WDL workflow to conduct a parallel analysis
- Monitor cloud costs associated with an analysis
After completing the workshop, attendees will be able to leverage AnVIL to analyze hosted datasets and launch analyses that are reproducible and scalable. Attendees will also be familiar with Human Pangenome data and resources.
Julian Lucas
Senior Bioinformatics Systems Analyst
University of California, Santa Cruz
Julian Lucas is a Senior Bioinformatics Systems Analyst in the Computational Genomics Platform at the UC Santa Cruz Genomics Institute. He leads the data coordination for the Human Pangenome Reference Consortium (HPRC) and utilizes the NHGRI AnVIL cloud compute platform for sharing data and analysis methods for this open-access project with the research community.
Beth Sheets, MS
Program Manager, Computational Genomics Platform
University of California, Santa Cruz
Beth Sheets is a Program Manager for the Computational Genomics Platform at the UC Santa Cruz Genomics Institute. She currently works with two NIH initiatives, NHLBI BioData Catalyst and NHGRI AnVIL, which are bringing researchers to secure, collaborative, cloud-based workspaces that offer petabyes of hosted data and hundreds of scientific tools. She works with a collaborative team that builds Dockstore.org, the scientific tool-sharing repository for these two NIH initiatives, which providers researchers with features and training to publish their bioinformatics pipelines using FAIR (Findable, Accessible, Interoperable, Reusable) standards.
Trevor Pesout
PhD Candidate
University of California, Santa Cruz
Trevor is a Ph.D. Candidate in the Computational Genomics Lab at the UCSC Genomics Institute. His work revolves around the use of third-generation long reads for phasing, polishing, and haplotyping. He has developed many of the Quality Control workflows for the HPRC assembly group. His contributions are available in the Human Pangenome Reference Consortium organization on Dockstore for the community to reuse in the AnVIL cloud ecosystem.
Mobin Asri
PhD Candidate
University of California, Santa Cruz
Mobin is a PhD Candidate in the Computational Genomics Lab at the UCSC Genomics Institute. He works on comparative genomics and developing tools for evaluating diploid assemblies. He has developed the assembly workflow and read-based QC workflows for the HPRC assembly group. His contributions are available in the Human Pangenome Reference Consortium organization on Dockstore for the community to reuse in the AnVIL cloud ecosystem.
Karen Miga, PhD
Association Professor
University of California, Santa Cruz
Karen Miga is an Assistant Professor in the Biomolecular Engineering Department at UCSC and Associate Director at the UCSC Genomics Institute. She co-leads the telomere-to-telomere (T2T) consortium and is the Project Director of the Human Pangenome Reference Consortium (HPRC) production center at UCSC. Her research program combines innovative computational and experimental approaches to produce the high-resolution sequence maps of human centromeric and pericentromeric DNAs.