spacekit.skopes.hst.svm.train

This module builds, trains, and evaluates an ensemble model for labeled and preprocessed SVM regression test data and alignment images. The ensemble model is a combination of two neural networks: a MultiLayerPerceptron (for regression test data) and a 3D Image Convolutional Neural Network (CNN). The script includes functions for the following steps:

  1. load and prep the data and images for ML

  2. build and train the model

  3. compute results and save to disk

This script (and/or its functions) should be used in conjunction with spacekit.skopes.hst.svm.prep if using raw data (since both the regression test dataframe for MLP and the png images for the CNN need to be created first). Once a model has been trained using this script, it is saved to disk and can be loaded again later for use with the predict script (spacekit.skopes.hst.svm. predict).

spacekit.skopes.hst.svm.train.compute_results(ens, tv_idx, val_set=(), output_path=None)[source]

Creates Compute objects of test and validation sets for model evaluation and saves calculated results to disk for later analysis. Validation set is a subset of data that has not been seen by the model and is necessary for measuring robustness.

Parameters:
  • ens (builder.networks.Ensemble) – ensemble model builder object

  • tv_idx (tuple or list of Pandas Series) – test and validation indices (used for FNFP analysis)

  • val_set (tuple or list of arrays) – validation set (X_val, y_val) of features and target arrays.

  • output_path (str, optional) – custom path for saving model, results, by default None (current working directory)

Returns:

Test and Validation computer objects (if val_set is left empty, returns only a single Com obj)

Return type:

spacekit.analyzer.compute.Computer objects

spacekit.skopes.hst.svm.train.load_ensemble_data(filename, img_path, img_size=128, dim=3, ch=3, norm=0, v=0.85, output_path=None)[source]

Loads regression test data from a csv file and image data from png files. Splits the data into train, test and validation sets, applies normalization (if norm=1), creates a maste index of the original dataset input names, and stacks the features and class targets for both data types into lists which can be used as inputs for an ensemble model.

Parameters:
  • filename (str) – path to preprocessed dataframe csv file

  • img_path (str) – path to png images parent directory

  • img_size (int, optional) – image size (single value assigned to width and height), by default 128

  • dim (int, optional) – dimensions (or volume) of image frames per image (for 3D CNN), by default 3

  • ch (int, optional) – channels (rgb is 3, grayscale is 1), by default 3

  • norm (bool, optional) – apply normalization step, by default 0

  • v (float, optional) – validation set ratio for evaluating model, by default 0.85

  • output_path (str, optional) – where to save the outputs (defaults to current working directory), by default None

Returns:

tv_idx, XTR, YTR, XTS, YTS, XVL, YVL list of test-validation indices, train-test feature (X) and target (y) numpy arrays.

Return type:

list, ndarrays

spacekit.skopes.hst.svm.train.make_ensembles(train_data, train_img, train_label, test_data, test_img, test_label, val_data=None, val_img=None, val_label=None)[source]

Creates tupled pairs of regression test (MLP) data and image (CNN) array inputs for an ensemble model.

Parameters:
  • train_data (numpy array) – training set feature data inputs

  • train_img (numpy array) – training set image inputs

  • train_label (numpy array) – training set target values

  • test_data (numpy array) – test set feature data inputs

  • test_img (numpy array) – test set image inputs

  • test_label (numpy array) – test set target values

  • val_data (numpy array, optional) – validation set feature data inputs

  • val_img (numpy array, optional) – validation set image inputs

  • val_label (numpy array, optional) – validation set target values

Returns:

XTR, YTR, XTS, YTS, XVL, YVL List/tuple of feature input arrays (data, img) and target values for train-test-val sets

Return type:

tuples of 6 ndarrays (only 4 if validation kwargs are None)

spacekit.skopes.hst.svm.train.run_training(data_file, img_path, img_size=128, norm=0, v=0.85, model_name='ensembleSVM', params=None, output_path=None, keras=True)[source]

Main calling function to load and prep the data, train the model, compute results and save to disk.

Parameters:
  • data_file (str (path)) – path to preprocessed dataframe csv file

  • img_path (str (path)) – path to png images parent directory

  • img_size (int, optional) – image size (single value assigned to width and height)

  • norm (int, optional) – apply normalization step (1=True, 0=False), by default 0

  • v (float, optional) – validation set ratio for evaluating model, by default 0.85

  • model_name (str, optional) – custom name to assign to model, by default “ensembleSVM”

  • params (dict, optional) – custom training hyperparameters dictionary, by default None

  • output_path (str (path), optional) – custom path for saving model, results, by default None (current working directory)

Returns:

ensemble builder object, binary compute object, validation compute object

Return type:

builder.networks.Ensemble, analyzer.compute.BinaryCompute, analyzer.compute.BinaryCompute

spacekit.skopes.hst.svm.train.train_ensemble(XTR, YTR, XTS, YTS, model_name='ensembleSVM', params=None, output_path=None, keras=True)[source]

Build, compile and fit an ensemble model with regression test data and image input arrays.

Parameters:
  • XTR (tuple/list) – training set feature (X) tuple of regression data and image data numpy arrays.

  • YTR (numpy array) – training set target values

  • XTS (tuple/list) – test set feature (X) tuple of regression data and image data numpy arrays.

  • YTS (numpy array) – test set target values

  • model_name (str, optional) – name of model, by default “ensembleSVM”

  • params (dict, optional) – custom parameters for model fitting, by default None

  • output_path (str, optional) – custom path for saving model, results, by default None (current working directory)

Returns:

Builder ensemble subclass model object trained on the inputs

Return type:

spacekit.builder.networks.Ensemble model object