Welcome to Yaoyi's Personal Website 😊

Hi there! 👋 I'm Yaoyi Dai, a PhD candidate in Quantitative and Computational Biology at Baylor College of Medicine in Houston, Texas.
I'm currently a Graduate Research Assistant in Dr. Wenyi Wang's Lab at MD Anderson Cancer Center, where I apply and extend tumor-specific total mRNA expression (TmS) - a deconvolution metric that captures cancer transcriptomic plasticity. My thesis project focuses on using TmS to stratify triple-negative breast cancer (TNBC) patients across multi-ethnic cohorts, revealing population-specific tumor microenvironment dynamics and identifying personalized therapeutic strategies for chemotherapy-resistant patients.
I hold an M.S. in Biostatistics from Washington University in St. Louis and recently completed a summer internship at Merck Sharp & Dohme LLC, working on integrative multi-omics and drug response characterization for KRAS inhibitors.
My expertise includes programming in R/R Shiny, Python, and SAS, along with next-generation sequencing analyses (WES/WGS, bulk RNA-seq, scRNA-seq, scATAC-seq, spatial transcriptomics) and computational method development for cancer research.
When I'm not analyzing genomic data, you'll find me staying active with indoor cycling, yoga, Pilates, and HIIT workouts! I also love experimenting in the kitchen with cooking and baking – my favorite way to procrastinate! 😋

Research Interest

Triple Negative Breast Cancer

TNBCs are defined as lacking expression of the ER, PR, and HER2 receptors, and they tend to have a more aggressive natural history than other breast cancer subtypes.

Genomic and Transcriptomic Deconvolution

Plasticity and tumor heterogeneity present a major challenge in the clinical management of breast cancers by impacting patient’s prognosis, therapeutic response, and clinical outcomes.

Allele-specific expression

In the context of cancer, allele specific expression is affected by both tumor and surrounding non-tumor cells. It is well studied that copy number alterations are highly predictive to transcript as well.

Publications

Deciphering transcriptional activity of the tumor microenvironment for robust stratification of chemotherapy response in triple-negative breast cancer

Triple-negative breast cancer (TNBC) exhibits heterogeneous treatment responses, yet molecular subtypes based on predefined biological pathways have shown limited prognostic value. We introduce tumor-specific total mRNA expression (TmS), a pathway-agnostic deconvolution metric derived from matched RNA/DNA sequencing data, as an alternative strategy. When applied to data from 575 TNBC patients across Western and East Asian populations, TmS outperformed established subtypes in predicting chemotherapy outcomes, stratifying patients into high-TmS: favorable prognosis, and low-TmS: poor prognosis. All cohorts exhibited increased immunological activity in high-TmS tumors and greater stromal enrichment in low-TmS tumors, as validated through single-cell RNA sequencing and digital pathology. East Asian patients displayed distinctive features, including stronger cell cycle-driven epithelial cells and unique B-cell activities. We identified extracellular matrix (ECM) pathways as potential therapeutic targets for chemotherapy-resistant low-TmS patients. Our findings chart a new map to conduct personalized therapy for TNBC, and provide new insights for developing population-aware therapeutic strategies.

A guide to transcriptomic deconvolution in cancer

Cancers show vast transcriptional variation in genes and pathways. Cancer tissues are heterogeneous mixtures of tumor, stromal and immune cells, where each component comprises multiple distinct cell types and/or states. Understanding the unique contributions of each cell type is crucial for advancing cancer biology, yet high-throughput expression profiles from tumor tissues only represent combined signals from all diverse cellular sources. Computational deconvolution of these mixed signals has emerged as a powerful approach to dissect both cellular composition and cell-type-specific expression patterns. Here, we provide a comprehensive guide to transcriptome deconvolution, specifically tailored for cancer researchers, presenting a systematic framework for selecting and applying deconvolution methods based on cancer-specific challenges and research objectives. Through detailed examination of nearly 40 deconvolution methods, we demonstrate how different approaches serve distinctive applications in cancer research: from understanding tumor-immune surveillance to identifying cancer subtypes, discovering prognostic biomarkers, and characterizing spatial tumor architecture. We present a practical decision framework considering the unique complexities of tumor tissues, data availability, and method assumptions. By examining the capabilities and limitations of these methods in a cancer context, we highlight emerging trends and future directions, particularly in addressing tumor cell plasticity and dynamic cell states. This guide will empower cancer researchers to effectively utilize deconvolution methods, advancing our understanding of cancer biology and informing clinical decision-making.

Estimation of tumor cell total mRNA expression in 15 cancer types predicts disease progression

Single-cell RNA sequencing studies have suggested that total mRNA content correlates with tumor phenotypes. Technical and analytical challenges, however, have so far impeded at-scale pan-cancer examination of total mRNA content. Here we present a method to quantify tumor-specific total mRNA expression (TmS) from bulk sequencing data, taking into account tumor transcript proportion, purity and ploidy, which are estimated through transcriptomic/genomic deconvolution. We estimate and validate TmS in 6,590 patient tumors across 15 cancer types, identifying significant inter-tumor variability. Across cancers, high TmS is associated with increased risk of disease progression and death. TmS is influenced by cancer-specific patterns of gene alteration and intra-tumor genetic heterogeneity as well as by pan-cancer trends in metabolic dysregulation. Taken together, our results indicate that measuring cell-type-specific total mRNA expression in tumor cells predicts tumor phenotypes and clinical outcomes.

Single-nucleus RNA-sequencing of autosomal dominant Alzheimer disease and risk variant carriers

Genetic studies of Alzheimer disease (AD) have prioritized variants in genes related to the amyloid cascade, lipid metabolism, and neuroimmune modulation. However, the cell-specific effect of variants in these genes is not fully understood. Here, we perform single-nucleus RNA-sequencing (snRNA-seq) on nearly 300,000 nuclei from the parietal cortex of AD autosomal dominant (APP and PSEN1) and risk-modifying variant (APOE, TREM2 and MS4A) carriers. Within individual cell types, we capture genes commonly dysregulated across variant groups. However, specific transcriptional states are more prevalent within variant carriers. TREM2 oligodendrocytes show a dysregulated autophagy-lysosomal pathway, MS4A microglia have dysregulated complement cascade genes, and APOEε4 inhibitory neurons display signs of ferroptosis. All cell types have enriched states in autosomal dominant carriers. We leverage differential expression and single-nucleus ATAC-seq to map GWAS signals to effector cell types including the NCK2 signal to neurons in addition to the initially proposed microglia. Overall, our results provide insights into the transcriptional diversity resulting from AD genetic architecture and cellular heterogeneity. The data can be explored on the online browser (http://web.hararilab.org/SNARE/).

Get In Touch

For project collaborations or any questions