University College London
Browse
1/1
11 files

Machine Learning Majorite barometer

Download all (2.8 MB)
Version 3 2021-02-11, 10:38
Version 2 2021-02-01, 15:02
Version 1 2020-12-15, 09:59
model
posted on 2020-12-11, 21:38 authored by Andrew ThomsonAndrew Thomson, Michael Walter, Anirudh Prabhu, Simon Kohn
A machine learning barometer (using Random Forest Regression) to calculate equilibration pressure for majoritic garnets
10/12/20:
******
The scripts below need to be run in a suitable Python3 environment (with pandas, numpy, matplotlib, sklearn and pickle packages + dependencies). For inexperienced users we recommend installing the latest anaconda python distribution (found here https://docs.anaconda.com/anaconda/install/) and running in Spyder (a GUI scripting environment provided with Anaconda.
Note - if running python 3.7 (or earlier) then you will need to install pickle5 package to use the provided barometer files and comment / uncomment the appropriate lines in the “calculate_pressures.py” (lines 16/17) and “rfr_majbar_10122020.py” (lines 26/27) scripts.
The user may additionally need to download and install the packages required if they are not provided with the anaconda distribution (pandas, numpy, matplotlib, scikit-learn and pickle). This will be obvious as, when run, the script will return an error similar to “No module name XXXX”. Packages can either be installed using the anaconda package manager or in the command line / terminal via commands such as:
conda install -c conda-forge pickle5

Appropriate command line installation commands can be obtained via searching the anaconda cloud at anaconda.org for each required package.
******

data files included in this repository are:
• "Majorite_database_03072020.xlsm" (Excel sheet of literature majoritic garnet compositions - inclusions and experiments - up to date as of 03/07/2020. This data includes all compositions that are close to majoritic, but some are borderline. Filtering as described in paper accompanying this barometer is performed in the python script prior to any data analysis or fitting)
• "lit_maj_nat_030720.txt" (python script input file of experimental literature majoritic garnet compositions - taken from dataset above)
• "di_incs_030720.txt" (python script input file of literature compilation of majoritic garnet inclusions observed in natural diamonds - taken from the dataset above)

A python script is provided to calculate pressures for any majoritic garnet using barometer calibrated in Thomson et al. (2021)
• calculate_pressures.py script takes an input file of any majoritic garnet compositions (example input file is provided “example_test_data.txt" - which are inclusion compositions reported by Zedgenizov et al., 2014, Chemical Geology, 363, pp 114-124).
• employs published RFR model and scaler - both provided as pickle files (pickle_model_20201210.pkl, scaler_20201210.pkl)
User can simply edit the input file name in the provided .py script - and then runs the script in a suitable python3 environment (requires pandas, numpy, sklearn and pickle packages). Script initially filters data for majoritic compositions (according to criteria used for barometer calibration) and predicts pressures for these compositions. Writes out pressures and 2 x std_dev in pressure estimates alongside input data into "out_pressures_test.txt".
*** if this script produces any errors or warnings it is likely because the serialised pickle files provided are not compatible with the python build being used (this is a common issue with serialised ML models). Please first try installing the pickle5 package and commenting/uncommenting lines 16/17. If this is unsuccessful then run the full barometer calibration script below (using the same input files as in Thomson et al. (2021) which are provided) to produce pickle files compatible with the python build on the local machine (action 5 of script below). Subsequently edit the filenames called in the “calculate_pressures.py” script (lines 22 & 27) to match the new barometer calibration files and re-run the calculate pressure script. The output (predicted pressures) for the test dataset provided (and using the published calibration) given in the output file should be similar to the following results:
P (GPa) error (GPa)
17.0 0.4
16.6 0.3
19.5 1.3
21.8 1.3
12.8 0.3
14.3 0.4
14.7 0.4
14.4 0.6
12.1 0.6
14.6 0.5
17.0 1.0
14.6 0.6
11.9 0.7
14.0 0.5
16.8 0.8

Full RFR barometer calibration script - rfr_majbar_10122020.py The RFR barometer calibration script used and described in Thomson et al. (2021). This script performs the following actions.
1) filters input data - outputs this filtered data as a .txt file (which is the input expected for RFR validation script using R package Caret)
2) fits 1000 RFR models each using a randomly selected training dataset (70% of the input data)
3) performs leave-one-out validation
4) plots figure 5 from Thomson et al. (2021)
5) fits one single RFR barometer using all input data (saves this and the scaler as .pkl files with a datestamp for use in the "calculate_pressures.py script)
6) calculates the pressure for all literature inclusion compositions over 100 iterations with randomly distributed compositional uncertainties added - provides the mean pressure and 2 std deviations, written alongside input inclusion compositons, as a .txt output file "diout.txt"
7) plots the global distribution of majoritic inclusion pressures

The RFR barometer can be easily updated to include (or exclude) additional experimental compositions by modification of the literature data input files provided


RFR validation using Caret in R (script titled “RFR_validation_03072020.R”)
Additional validation tests of RFR barometer completed using the Caret package in R. Requires the filtered experimental dataset file "data_filteredforvalidation.txt" (which is generated by the rfr_majbar_10122020.py script if required for a new dataset) performs bootstrap, K-fold and leave-one out validation. outputs validation stats for 5, 7 and 9 input variables (elements)

Please email Andrew Thomson (a.r.thomson@ucl.ac.uk) if you have any questions or queries.

Funding

Calcium Perovskite: the forgotten mantle phase

Natural Environment Research Council

Find out more...

History