This project is a collection of files to allow users to reproduce the model development and benchmarking in "Dawnn: single-cell differential abundance with neural networks" (Hall and Castellano, under review). Dawnn is a tool for detecting differential abundance in single-cell RNAseq datasets. It is available as an R package here. Please contact us if you are unable to reproduce any of the analysis in our paper.
The files in this collection correspond to the code used to execute the benchmarking and the results.
FILES:
Benchmarking code
The files named collect_* execute github.com/george-hall-ucl/dawnn_paper_code/blob/main/benchmarking_utilities_as_script.R for different datasets
collect_results_all_sim_dat.R Script to execute benchmarking_utilities_as_script.R for simulated discrete clusters, linear trajectories, and branching trajectories datasets; then collect results in tpr_fdr_results_discrete_clusters_rerun.csv, tpr_fdr_results_linear_traj_rerun.csv, and tpr_fdr_results_branch_traj_rerun.csv, respectively. Seurat datasets read in from cells_sim_discerete_clusters_gex_seed_*.rds, cells_sim_linear_traj_gex_seed_*.rds, and cells_sim_branching_traj_gex_seed_*.rds, respectively. Simulated labels read in from benchmark_dataset_sim_discrete_clusters.csv, benchmark_dataset_sim_linear_traj.csv, and benchmark_dataset_sim_branching_traj.csv, respectively.
collecting_results_mouse.sh Script to execute benchmarking_utilities_as_script.R for mouse gastrulation dataset (generated in 10.5522/04/22614004); then collect results in tpr_fdr_results_mouse_regen.csv. Seurat dataset read in from 10.5522/04/22614004/mouse_gastrulation_data_regen.rds and simulated labels read in from 10.5522/04/22614004/benchmark_dataset_mouse.csv.
collecting_results_skin.sh Script to execute benchmarking_utilities_as_script.R for keratinocyte dataset (generated in 10.5522/04/22607236); then collect results in tpr_fdr_results_skin_regen.csv. Seurat dataset read in from 10.5522/04/22607236/skin_data_end_pipeline_1458110522.rds and simulated labels read in from 10.5522/04/22607236/benchmark_dataset_skin.csv.
collecting_results_organoid.sh Script to execute benchmarking_utilities_as_script.R for organoid dataset (generated in 10.5522/04/22612576); then collect results in tpr_fdr_results_organoid_regen.csv. Seurat dataset read in from 10.5522/04/22612576/organoid_cells.RDS and simulated labels read in from 10.5522/04/22612576/benchmark_dataset_organoid_labels.csv.
collecting_results_heart.sh Script to execute benchmarking_utilities_as_script.R for heart dataset (generated in 10.5522/04/22601260); then collect results in tpr_fdr_results_heart_regen.csv. Seurat dataset read in from 10.5522/04/22601260/heart_tissue_cells.RDS and simulated labels read in from 10.5522/04/22601260/benchmark_dataset_heart_data_type_labels.csv.
benchmarking_liver_cirrhosis_analysis.R R code to process liver cirrhosis dataset using standard single-cell RNAseq pipeline, then run Dawnn, Milo, and DA-seq. Processing code adapted from github.com/MarioniLab/milo_analysis_2020/blob/main/notebooks/Fig5_liver_cirrhosis.Rmd#L261. Results stored in liver_cirrhosis_results_rerun.csv.
Benchmarking results
tpr_fdr_results_discrete_clusters_rerun.csv Results from benchmarking on discrete clusters dataset (generated by collect_results_all_sim_dat.R).
tpr_fdr_results_linear_traj_rerun.csv Results from benchmarking on linear trajectory dataset (generated by collect_results_all_sim_dat.R).
tpr_fdr_results_branch_traj_rerun.csv Results from benchmarking on branching trajectory dataset (generated by collect_results_all_sim_dat.R).
tpr_fdr_results_mouse_regen.csv Results from benchmarking on mouse dataset (generated by collecting_results_mouse.sh).
tpr_fdr_results_skin_regen.csv Results from benchmarking on skin dataset (generated by collecting_results_skin.sh).
tpr_fdr_results_organoid_regen.csv Results from benchmarking on organoid dataset (generated by collecting_results_organoid.sh).
tpr_fdr_results_heart_regen.csv Results from benchmarking on heart dataset (generated by collecting_results_heart.sh).
liver_cirrhosis_results_rerun.csv Results from running on cirrhotic liver dataset (generated by benchmarking_liver_cirrhosis_analysis.R).
Funding
NIHR Great Ormond Street Hospital Biomedical Research Centre