Workshop: Using Gabriella Miller Kids First’s datasets and platforms for pediatric cancer and structural birth defect research

  • Register
    • Regular Member - Free!
    • Early Career Member - Free!
    • Resident/Clinical Fellow Member - Free!
    • Postdoctoral Fellow Member - Free!
    • Graduate Student Member - Free!
    • Undergraduate Student Member - Free!
    • Emeritus Member - Free!
    • Life Member - Free!
    • Trainee Member - Free!

The Gabriella Miller Kids First Pediatric Research Program is an NIH Common Fund initiative focused on providing large-scale clinically annotated genomic data for pediatric cancer and structural birth defect cohorts, including tumor- and germline whole genome sequencing (WGS), trio based joint-genotyping, and paired RNA sequencing of somatic tissues. The Gabriella Miller Kids First Data Resource Center (Kids First DRC; is charged with generating these datasets and empowering collaborative discovery on its data resource platforms. 26 Kids First studies are released on the Kids First Data Resource Portal, representing more than 21,000 participants and more than 1.25 PB of data, with additional datasets being released yearly. At this workshop, we present an overview of the Kids First DRC’s publicly available datasets and platforms, interactive demonstrations for how to use the Kids First Data Resource Portal ( to build a virtual cohort of participants and files for research, and an introduction to analyzing data on CAVATICA, Kids First’s cloud-based analysis platform (  Attendees will follow along on their own devices, using three different methods for searching  Kids First DRC’s datasets on the Kids First Data Resource Portal. The Explore Data tool allows users to search for participants of interest based on harmonized clinical phenotypes and diagnoses using the HPO and MONDO ontologies. The File Repository is designed for searching harmonized genomic files associated with these participants, from source alignments to fully processed called variants and gene expression quantification. Finally, Variant Search allows for searching for germline variants present within Kids First datasets. The Kids First Data Resource Portal is integrated with CAVATICA by Seven Bridges for running large-scale bioinformatic workflows such as alignments or variant calling. CAVATICA also supports interactive notebook-based analyses in R Studio and Jupyter Lab, all within their browser window. The session will conclude with information about how to apply for access to these publicly available datasets to help investigators of all types access and analyze genomics-scale pediatrics data as part of their research.

      • Children born with structural birth defects are more likely to be diagnosed with cancer before they turn 18, suggesting common underlying genetic causes.
      • The publicly-funded Gabriella Miller Kids First Data Resource Center produces curated clinical and genomic datasets with the goal of uncovering new insights into the biology of these conditions.
      • The Kids First Data Resource Portal is a free web tool for searching Kids First datasets and building virtual cohorts of participants based on clinical phenotypes, file metadata, or germline variants.
      • CAVATICA by Velsera is a cloud-based analysis platform that allows users to build and run large-scale bioinformatic pipelines as well as browser-based interactive analysis for further discoveries.


Live Workshop Event
08/01/2023 at 12:00 PM (EDT)  |  Recorded On: 08/02/2023  |  160 minutes
08/01/2023 at 12:00 PM (EDT)  |  Recorded On: 08/02/2023  |  160 minutes