TEXT
TEXT
.PKL
TEXT
.PKL
TEXT
TEXT
JUPYTER
TEXT
JUPYTER
DOCUMENT
.XLSM
TEXT
1/1
Machine Learning Majorite barometer
model
posted on 2021-02-11, 10:38 authored by Andrew ThomsonAndrew Thomson, Michael Walter, Anirudh Prabhu, Simon KohnA machine learning barometer (using Random Forest Regression) to calculate equilibration pressure for majoritic garnets
Updated 04/02/21 (21/01/21) (10/12/20):
The barometer code
data files included in this repository are:
• "Majorite_database_04022021.xlsm" (Excel sheet of literature majoritic garnet compositions - inclusions (up to date as of 04/02/2021) and experiments (up to date as of 03/07/2020). This data includes all compositions that are close to majoritic, but some are borderline. Filtering as described in paper accompanying this barometer is performed in the python script prior to any data analysis or fitting)
• "di_incs_040221.txt" (python script input file of literature compilation of majoritic garnet inclusions observed in natural diamonds - taken from the dataset above)
******
The barometer as Jupiter Notebooks - including integrated Caret validation (added 21/01/2021)
For those more unfamiliar with Python, running the barometer as a Notebook is somewhat more intuitive than running the scripts below. It also has the benefit of including the RFR validation in using Caret within a single integrated notebook. For success the Jupiter Notebook requires a suitable Python3 environment (with pandas, numpy, matplotlib, sklearn, rpy2 and pickle packages + dependencies). We recommend installing the latest anaconda python distribution (found here https://docs.anaconda.com/anaconda/install/) and creating a custom environment containing the required packages to run the Jupiter Notebook (as both python3 and R must be active in the environment). Instructions on this procedure can be found here (https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html), or to assist we have provided a copy of the environment used to produce the scripts to assist in this process (barom-spec-file.txt).
An identical conda environment (called myenv) can be created, and used by:
1) copying the barometer-spec-file.txt to a suitable location (i.e. your home directory)
2) running the command
conda create --name myenv --file barom-spec-file.txt
3) entering this environment
conda activate myenv
4) Running an instance of Jupyter Notebook by typing
jupyter notebook
Two Notebooks are provided:
• calculate_pressures_notebook.ipynb (equivalent to calculate_pressures.py described below)
*******
The barometer as scripts (10/12/2020)
The scripts below need to be run in a suitable Python3 environment (with pandas, numpy, matplotlib, sklearn and pickle packages + dependencies). For inexperienced users we recommend installing the latest anaconda python distribution (found here https://docs.anaconda.com/anaconda/install/) and running in Spyder (a GUI scripting environment provided with Anaconda.
The user may additionally need to download and install the packages required if they are not provided with the anaconda distribution (pandas, numpy, matplotlib, scikit-learn and pickle). This will be obvious as, when run, the script will return an error similar to “No module name XXXX”.
Packages can either be installed using the anaconda package manager or in the command line / terminal via commands such as:
conda install -c conda-forge pickle5
Appropriate command line installation commands can be obtained via searching the anaconda cloud at anaconda.org for each required package.
A python script (.py) is provided to calculate pressures for any majoritic garnet using barometer calibrated in Thomson et al. (2021)
• calculate_pressures.py script takes an input file of any majoritic garnet compositions (example input file is provided “example_test_data.txt" - which are inclusion compositions reported by Zedgenizov et al., 2014, Chemical Geology, 363, pp 114-124).
• employs published RFR model and scaler - both provided as pickle files (pickle_model_20201210.pkl, scaler_20201210.pkl)
User can simply edit the input file name in the provided .py script - and then runs the script in a suitable python3 environment (requires pandas, numpy, sklearn and pickle packages). Script initially filters data for majoritic compositions (according to criteria used for barometer calibration) and predicts pressures for these compositions. Writes out pressures and 2 x std_dev in pressure estimates alongside input data into "out_pressures_test.txt".
*** if this script produces any errors or warnings it is likely because the serialised pickle files provided are not compatible with the python build being used (this is a common issue with serialised ML models). Please first try installing the pickle5 package and commenting/uncommenting lines 16/17. If this is unsuccessful then run the full barometer calibration script below (using the same input files as in Thomson et al. (2021) which are provided) to produce pickle files compatible with the python build on the local machine (action 5 of script below). Subsequently edit the filenames called in the “calculate_pressures.py” script (lines 22 & 27) to match the new barometer calibration files and re-run the calculate pressure script. The output (predicted pressures) for the test dataset provided (and using the published calibration) given in the output file should be similar to the following results:
17.0 0.4
16.6 0.3
19.5 1.3
21.8 1.3
12.8 0.3
14.3 0.4
14.7 0.4
14.4 0.6
12.1 0.6
14.6 0.5
17.0 1.0
14.6 0.6
11.9 0.7
14.0 0.5
16.8 0.8
Full RFR barometer calibration script -
rfr_majbar_10122020.py The RFR barometer calibration script used and described in Thomson et al. (2021). This script performs the following actions.
1) filters input data - outputs this filtered data as a .txt file (which is the input expected for RFR validation script using R package Caret)2) fits 1000 RFR models each using a randomly selected training dataset (70% of the input data)
3) performs leave-one-out validation
4) plots figure 5 from Thomson et al. (2021)
5) fits one single RFR barometer using all input data (saves this and the scaler as .pkl files with a datestamp for use in the "calculate_pressures.py script)
6) calculates the pressure for all literature inclusion compositions over 100 iterations with randomly distributed compositional uncertainties added - provides the mean pressure and 2 std deviations, written alongside input inclusion compositons, as a .txt output file "diout.txt"
7) plots the global distribution of majoritic inclusion pressures
The RFR barometer can be easily updated to include (or exclude) additional experimental compositions by modification of the literature data input files provided
RFR validation using Caret in R (script titled “RFR_validation_03072020.R”)
Please email Andrew Thomson (a.r.thomson@ucl.ac.uk) if you have any questions or queries.
Funding
History
I confirm that I am not uploading any: personal data as defined by data protection legislation, including information that may identify a living individual; information provided in confidence; or information that would contravene a third-party agreement
- Yes
I have considered whether the data to be published may be licensed commercially before deciding to freely release it to the public. Further information and advice may be sought from UCL Business https://www.uclb.com/about/our-people/
- Yes