<p>This project is a collection of files to allow users to reproduce the model development and benchmarking in "Dawnn: single-cell differential abundance with neural networks" (Hall and Castellano, under review). Dawnn is a tool for detecting differential abundance in single-cell RNAseq datasets. It is available as an R package <a href="https://github.com/george-hall-ucl/dawnn" target="_blank">here</a>. Please contact us if you are unable to reproduce any of the analysis in our paper.</p>
<p>The files in this collection correspond to the code used to execute the benchmarking and the results.</p>
<p><br></p>
<p>FILES:</p>
<p><u>Benchmarking code</u></p>
<p>The files named <em>collect_*</em> execute github.com/george-hall-ucl/dawnn_paper_code/blob/main/benchmarking_utilities_as_script.R for different datasets</p>
<ul>
<li><strong>collect_results_all_sim_dat.R</strong> Script to execute <em>benchmarking_utilities_as_script.R </em>for simulated discrete clusters, linear trajectories, and branching trajectories datasets; then collect results in <em>tpr_fdr_results_discrete_clusters_rerun.csv, tpr_fdr_results_linear_traj_rerun.csv</em>, and <em>tpr_fdr_results_branch_traj_rerun.csv</em>, respectively. Seurat datasets read in from <em>cells_sim_discerete_clusters_gex_seed_*.rds</em>, <em>cells_sim_linear_traj_gex_seed_*.rds</em>, and <em>cells_sim_branching_traj_gex_seed_*.rds</em>, respectively. Simulated labels read in from <em>benchmark_dataset_sim_discrete_clusters.csv</em>, <em>benchmark_dataset_sim_linear_traj.csv</em>, and <em>benchmark_dataset_sim_branching_traj.csv</em>, respectively.</li>
<li><strong>collecting_results_mouse.sh </strong>Script to execute <em>benchmarking_utilities_as_script.R </em>for mouse gastrulation dataset (generated in <em>10.5522/04/22614004</em>); then collect results in <em>tpr_fdr_results_mouse_regen.csv.</em> Seurat dataset read in from <em>10.5522/04/22614004/mouse_gastrulation_data_regen.rds</em> and simulated labels read in from <em>10.5522/04/22614004/benchmark_dataset_mouse.csv</em>.</li>
<li><strong>collecting_results_skin.sh </strong>Script to execute <em>benchmarking_utilities_as_script.R </em>for keratinocyte dataset (generated in <em>10.5522/04/22607236</em>); then collect results in <em>tpr_fdr_results_skin_regen.csv.</em> Seurat dataset read in from 10.5522/04/22607236/skin_data_end_pipeline_1458110522.rds and simulated labels read in from <em>10.5522/04/22607236/benchmark_dataset_skin.csv</em>.</li>
<li><strong>collecting_results_organoid.sh </strong>Script to execute <em>benchmarking_utilities_as_script.R </em>for organoid dataset (generated in <em>10.5522/04/22612576</em>); then collect results in <em>tpr_fdr_results_organoid_regen.csv.</em> Seurat dataset read in from <em>10.5522/04/22612576/organoid_cells.RDS</em> and simulated labels read in from <em>10.5522/04/22612576/benchmark_dataset_organoid_labels.csv</em>.</li>
<li><strong>collecting_results_heart.sh </strong>Script to execute <em>benchmarking_utilities_as_script.R </em>for heart dataset (generated in 10.5522/04/22601260); then collect results in <em>tpr_fdr_results_heart_regen.csv.</em> Seurat dataset read in from <em>10.5522/04/22601260/heart_tissue_cells.RDS</em> and simulated labels read in from <em>10.5522/04/22601260/benchmark_dataset_heart_data_type_labels.csv.</em></li>
<li><strong>benchmarking_liver_cirrhosis_analysis.R</strong> R code to process liver cirrhosis dataset using standard single-cell RNAseq pipeline, then run Dawnn, Milo, and DA-seq. Processing code adapted from <em>github.com/MarioniLab/milo_analysis_2020/blob/main/notebooks/Fig5_liver_cirrhosis.Rmd#L261</em>. Results stored in <em>liver_cirrhosis_results_rerun.csv</em>.</li>
</ul>
<p><u>Benchmarking results</u></p>
<ul>
<li><strong>tpr_fdr_results_discrete_clusters_rerun.csv</strong> Results from benchmarking on discrete clusters dataset (generated by <em>collect_results_all_sim_dat.R</em>).</li>
<li><strong>tpr_fdr_results_linear_traj_rerun.csv </strong>Results from benchmarking on linear trajectory dataset (generated by <em>collect_results_all_sim_dat.R</em>).</li>
<li><strong>tpr_fdr_results_branch_traj_rerun.csv </strong>Results from benchmarking on branching trajectory dataset (generated by <em>collect_results_all_sim_dat.R</em>).</li>
<li><strong>tpr_fdr_results_mouse_regen.csv </strong>Results from benchmarking on mouse dataset (generated by <em>collecting_results_mouse.sh</em>).</li>
<li><strong>tpr_fdr_results_skin_regen.csv </strong>Results from benchmarking on skin dataset (generated by <em>collecting_results_skin.sh</em>).</li>
<li><strong>tpr_fdr_results_organoid_regen.csv </strong>Results from benchmarking on organoid dataset (generated by <em>collecting_results_organoid.sh</em>).</li>
<li><strong>tpr_fdr_results_heart_regen.csv </strong>Results from benchmarking on heart dataset (generated by <em>collecting_results_heart.sh</em>).</li>
<li><strong>liver_cirrhosis_results_rerun.csv</strong> Results from running on cirrhotic liver dataset (generated by <em>benchmarking_liver_cirrhosis_analysis.R</em>).</li>
</ul>
Funding
NIHR Great Ormond Street Hospital Biomedical Research Centre