Data for Gradient boosted decision trees reveal nuances of auditory discrimination behavior
Raw data for the article: Gradient boosted decision trees reveal nuances of auditory discrimination behaviour (PLOS Computational Biology).
This data repository contains the csv files after extraction of the raw MATLAB metadata files into pandas (Python) dataframes (helper function author: Jules Lebert). The csv files can easily be loaded back into dataframe objects using pandas before the subsampling steps (as documented in the paper, we used subsampling to ensure the number of F0-roved and control F0 trials were relatively equal) are completed.
Link to GitHub repository to run the models on this data: https://github.com/carlacodes/boostmodels
A full description of each of the variables within the dataframe can be found in the data_description_instructions_for_datasets_plos_bio.pdf.
Abstract: Animal psychophysics can generate rich behavioral datasets, often comprised of many 1000s of trials for an individual subject. Gradient-boosted models are a promising machine learning approach for analyzing such data, partly due to the tools that allow users to gain insight into how the model makes predictions. We trained ferrets to report a target word’s presence, timing, and lateralization within a stream of consecutively presented non-target words. To assess the animals’ ability to generalize across pitch, we manipulated the fundamental frequency (F0) of the speech stimuli across trials, and to assess the contribution of pitch to streaming, we roved the F0 from word token-to-token. We then implemented gradient-boosted regression and decision trees on the trial outcome and reaction time data to understand the behavioral factors behind the ferrets’ decision-making. We visualized model contributions by implementing SHAPs feature importance and partial dependency plots. While ferrets could accurately perform the task across all pitch-shifted conditions, our models reveal subtle effects of shifting F0 on performance, with within-trial pitch shifting elevating false alarms and extending reaction times. Our models identified a subset of non-target words that animals commonly false alarmed to. Follow-up analysis demonstrated that the spectrotemporal similarity of target and non-target words rather than similarity in duration or amplitude waveform was the strongest predictor of the likelihood of false alarming. Finally, we compared the results with those obtained with traditional mixed effects models, revealing equivalent or better performance for the gradient-boosted models over these approaches.
Funding
Selective Attention: How does Neural Response Modulation in Auditory Cortex Enable Auditory Scene Analysis?
Biotechnology and Biological Sciences Research Council
Find out more...History
I have considered whether the data to be published may be licensed commercially before deciding to freely release it to the public. Further information and advice may be sought from UCL Business https://www.uclb.com/about/our-people/
- Yes