spacekit.skopes.hst.svm.train
This module builds, trains, and evaluates an ensemble model for labeled and preprocessed SVM regression test data and alignment images. The ensemble model is a combination of two neural networks: a MultiLayerPerceptron (for regression test data) and a 3D Image Convolutional Neural Network (CNN). The script includes functions for the following steps:
load and prep the data and images for ML
build and train the model
compute results and save to disk
This script (and/or its functions) should be used in conjunction with spacekit.skopes.hst.svm.prep if using raw data (since both the regression test dataframe for MLP and the png images for the CNN need to be created first). Once a model has been trained using this script, it is saved to disk and can be loaded again later for use with the predict script (spacekit.skopes.hst.svm. predict).
- spacekit.skopes.hst.svm.train.compute_results(ens, tv_idx, val_set=(), output_path=None)[source]
Creates Compute objects of test and validation sets for model evaluation and saves calculated results to disk for later analysis. Validation set is a subset of data that has not been seen by the model and is necessary for measuring robustness.
- Parameters:
ens (builder.networks.Ensemble) – ensemble model builder object
tv_idx (tuple or list of Pandas Series) – test and validation indices (used for FNFP analysis)
val_set (tuple or list of arrays) – validation set (X_val, y_val) of features and target arrays.
output_path (str, optional) – custom path for saving model, results, by default None (current working directory)
- Returns:
Test and Validation computer objects (if val_set is left empty, returns only a single Com obj)
- Return type:
spacekit.analyzer.compute.Computer objects
- spacekit.skopes.hst.svm.train.load_ensemble_data(filename, img_path, img_size=128, dim=3, ch=3, norm=0, v=0.85, output_path=None)[source]
Loads regression test data from a csv file and image data from png files. Splits the data into train, test and validation sets, applies normalization (if norm=1), creates a maste index of the original dataset input names, and stacks the features and class targets for both data types into lists which can be used as inputs for an ensemble model.
- Parameters:
filename (str) – path to preprocessed dataframe csv file
img_path (str) – path to png images parent directory
img_size (int, optional) – image size (single value assigned to width and height), by default 128
dim (int, optional) – dimensions (or volume) of image frames per image (for 3D CNN), by default 3
ch (int, optional) – channels (rgb is 3, grayscale is 1), by default 3
norm (bool, optional) – apply normalization step, by default 0
v (float, optional) – validation set ratio for evaluating model, by default 0.85
output_path (str, optional) – where to save the outputs (defaults to current working directory), by default None
- Returns:
tv_idx, XTR, YTR, XTS, YTS, XVL, YVL list of test-validation indices, train-test feature (X) and target (y) numpy arrays.
- Return type:
list, ndarrays
- spacekit.skopes.hst.svm.train.make_ensembles(train_data, train_img, train_label, test_data, test_img, test_label, val_data=None, val_img=None, val_label=None)[source]
Creates tupled pairs of regression test (MLP) data and image (CNN) array inputs for an ensemble model.
- Parameters:
train_data (numpy array) – training set feature data inputs
train_img (numpy array) – training set image inputs
train_label (numpy array) – training set target values
test_data (numpy array) – test set feature data inputs
test_img (numpy array) – test set image inputs
test_label (numpy array) – test set target values
val_data (numpy array, optional) – validation set feature data inputs
val_img (numpy array, optional) – validation set image inputs
val_label (numpy array, optional) – validation set target values
- Returns:
XTR, YTR, XTS, YTS, XVL, YVL List/tuple of feature input arrays (data, img) and target values for train-test-val sets
- Return type:
tuples of 6 ndarrays (only 4 if validation kwargs are None)
- spacekit.skopes.hst.svm.train.run_training(data_file, img_path, img_size=128, norm=0, v=0.85, model_name='ensembleSVM', params=None, output_path=None, keras=True)[source]
Main calling function to load and prep the data, train the model, compute results and save to disk.
- Parameters:
data_file (str (path)) – path to preprocessed dataframe csv file
img_path (str (path)) – path to png images parent directory
img_size (int, optional) – image size (single value assigned to width and height)
norm (int, optional) – apply normalization step (1=True, 0=False), by default 0
v (float, optional) – validation set ratio for evaluating model, by default 0.85
model_name (str, optional) – custom name to assign to model, by default “ensembleSVM”
params (dict, optional) – custom training hyperparameters dictionary, by default None
output_path (str (path), optional) – custom path for saving model, results, by default None (current working directory)
- Returns:
ensemble builder object, binary compute object, validation compute object
- Return type:
builder.networks.Ensemble, analyzer.compute.BinaryCompute, analyzer.compute.BinaryCompute
- spacekit.skopes.hst.svm.train.train_ensemble(XTR, YTR, XTS, YTS, model_name='ensembleSVM', params=None, output_path=None, keras=True)[source]
Build, compile and fit an ensemble model with regression test data and image input arrays.
- Parameters:
XTR (tuple/list) – training set feature (X) tuple of regression data and image data numpy arrays.
YTR (numpy array) – training set target values
XTS (tuple/list) – test set feature (X) tuple of regression data and image data numpy arrays.
YTS (numpy array) – test set target values
model_name (str, optional) – name of model, by default “ensembleSVM”
params (dict, optional) – custom parameters for model fitting, by default None
output_path (str, optional) – custom path for saving model, results, by default None (current working directory)
- Returns:
Builder ensemble subclass model object trained on the inputs
- Return type:
spacekit.builder.networks.Ensemble model object