spacekit.analyzer.compute

Inheritance diagram of spacekit.analyzer.compute

class spacekit.analyzer.compute.Computer(algorithm, res_path=None, show=False, validation=False, name='Computer', **log_kws)[source]

Bases: object

acc_loss_scores()[source]

Calculate overall accuracy and loss metrics of training and test sets.

Returns:

mean accuracy and loss scores of training and test sets (generated via Keras history)

Return type:

dictionary

builder_inputs(builder=None)[source]

Produces same result as inputs method, using a builder object’s attributes instead. Allows for automatic switch to validation set.

Parameters:

builder (spacekit.builder.networks.Builder object, optional) – Builder object used to train the model (instantiantes its own attributes from it), by default None

Returns:

Computer object updated with Builder attributes

Return type:

self

download(outputs)[source]

Downloads model training results (outputs calculated by Computer obj) to local files for later retrieval and plotting/analysis.

Parameters:

outputs (dictionary) – Outputs created by their respective subclasses using the make_outputs method.

draw_plots()[source]

Generate standard classification model plots (keras accuracy and loss, ROC-AUC curve, Precision-Recall curve, Confusion Matrix).

Returns:

updated with standard plot attribute values

Return type:

Computer object

fusion_matrix(cm, classes, normalize=True, cmap='Blues')[source]

Confusion Matrix. Can pass in matrix or a tuple (ytrue,ypred) to create on the fly classes: class names for target variables

Parameters:
  • cm (tuple or sklearn confusion_matrix object) – (y_test, y_pred) tuple or a confusion matrix of true and false positives and negatives.

  • classes (list) – class labels (strings) to show on the axes

  • normalize (bool, optional) – Show percentages instead of raw values, by default True

  • cmap (str, optional) – Colormap, by default “Blues”

Returns:

confusion matrix figure with colorscale

Return type:

matplotlib.pyplot Figure

inputs(model, history, X_train, y_train, X_test, y_test, test_idx)[source]

Instantiates training vars as attributes. By default, a Computer object is instantiated without these - they are only needed for calculating and storing results which can then be retrieved by Computer separately (without training vars) from npz compressed files using the upload() method.

Parameters:
  • model (object) – Keras functional model

  • history (dict) – model training history

  • X_train (Pandas dataframe or Numpy array) – training feature data

  • y_train (Pandas dataframe or Numpy array) – training target data

  • X_test (Pandas dataframe or Numpy array) – test/validation feature data

  • y_test (Pandas dataframe or Numpy array) – test/validation target data

  • test_idx (Pandas series) – test data index and ground truth values (y_test)

Returns:

updated with model attributes used for calculating results

Return type:

Computer object (self)

keras_acc_plot()[source]

Line plot of training and test accuracy scores per epoch

Returns:

Keras history training and test set accuracy scores for each epoch

Return type:

plotly.graph_obj Figure

keras_loss_plot()[source]

Line plot of training and test loss scores per epoch

Returns:

Keras history training and test set loss scores for each epoch

Return type:

plotly.graph_obj Figure

make_pr_curve()[source]

Plots the Precision-Recall Curve

Returns:

Precision-Recall curve figure plot

Return type:

plotly.graph_obj Figure

make_roc_curve()[source]

Plots the Receiver-Operator Characteristic (Area Under the Curve).

Returns:

ROC-AUC interactive figure plot

Return type:

plotly.graph_obj Figure

onehot_y(prefix='lab')[source]

Generates onehot-encoded dataframe of categorical target class values (for multiclassification models).

Parameters:

prefix (str, optional) – abbreviated string prefix for target class name. Defaults to “lab” (abbr for “label”)., by default “lab”

Returns:

one-hot encoded target class labels (dummies)

Return type:

dataframe

resid_plot()[source]

Plot the residual error for a regression model.

Returns:

interactive scatter plot figure of residuals in the test set

Return type:

plotly.graph_obj Figure

roc_plots()[source]

Calculates ROC_AUC score and plots Receiver Operator Characteristics (ROC).

Returns:

  • int – roc_auc_score (via sklearn)

  • Figure – receiver-operator characteristic area under the curve (ROC-AUC) plot

score_y()[source]

Probability scores for classification model predictions (y_pred probabilities)

Returns:

y_scores probabilities array

Return type:

ndarray

upload()[source]

Imports model training results (outputs previously calculated by Computer obj) from npz compressed files. These can then be used for plotting/analysis.

Returns:

model training results loaded from files on local disk.

Return type:

dictionary

class spacekit.analyzer.compute.ComputeClassifier(algorithm='clf', classes=['2g', '8g', '16g', '64g'], res_path='results/mem_bin', show=False, validation=False, name='ComputeClassifier', **log_kws)[source]

Bases: Computer

Computer subclass with additional methods specific to classification models.

Parameters:

Computer (Class object) – spacekit.analyzer.compute.Computer object

load_results(outputs)[source]

Load a previously trained model’s results from local disk and store in a dictionary.

Parameters:

outputs (dictionary) – outputs stored in a dictionary

Returns:

spacekit.analyzer.compute.ComputeClassifier subclass object

Return type:

self

make_outputs(dl=True)[source]

Store computed results into a dictionary, and optionally save to disk.

Parameters:

dl (bool, optional) – Download results (save as files on local disk), by default True

Returns:

outputs stored in a dictionary

Return type:

dictionary

print_summary()[source]

Prints an sklearn-based classification report of model evaluation metrics, along with accuracy, loss, roc_auc fnfp scores to standard out. The report is also stored as a dictionary in the Computer object’s self.report attribute.

track_fnfp()[source]

Determine index names of false negatives and false positives from the training inputs and store in a dictionary along with related prediction probabilities.

Returns:

false-negative false-positive results

Return type:

dictionary

class spacekit.analyzer.compute.ComputeBinary(builder=None, algorithm='binary', classes=['aligned', 'misaligned'], res_path='results/svm', show=False, validation=False, **log_kws)[source]

Bases: ComputeClassifier

ComputeClassifier subclass with additional methods specific to binary classification models.

Parameters:

ComputeClassifier (Subclass object) – spacekit.analyzer.compute.ComputeClassifier object

calculate_results(show_summary=True)[source]

Calculate metrics relevant to binary classification model training and assign to the appropriate subclass attributes.

Parameters:

show_summary (bool, optional) – print the classification report and other summarized metrics to standard out, by default True

Returns:

spacekit.analyzer.compute.ComputeBinary subclass object

Return type:

self

class spacekit.analyzer.compute.ComputeMulti(builder=None, algorithm='multiclass', classes=['2g', '8g', '16g', '64g'], res_path='results/mem_bin', show=False, validation=False, **log_kws)[source]

Bases: ComputeClassifier

ComputeClassifier subclass with additional methods specific to multi-classification models.

Parameters:

ComputeClassifier (Subclass object) – spacekit.analyzer.compute.ComputeClassifier object

calculate_multi(show_summary=True)[source]

Calculate metrics relevant to multi-classification model training and assign to the appropriate subclass attributes.

Parameters:

show_summary (bool, optional) – print the classification report and other summarized metrics to standard out, by default True

Returns:

spacekit.analyzer.compute.ComputeMulti subclass object

Return type:

self

fnfp_multi()[source]

Determine index names of false negatives and false positives from the training inputs and store in a dictionary along with related prediction probabilities.

Returns:

false-negative false-positive results

Return type:

dictionary

onehot_multi(prefix='bin')[source]

Generates onehot-encoded dataframe of categorical target class values (for multiclassification models).

Parameters:

prefix (str, optional) – abbreviated string prefix for target class name, by default “bin”

Returns:

one-hot encoded target class labels (dummies)

Return type:

dataframe

roc_auc_multi()[source]

Calculate the ROC-AUC scores for each label of a multiclass model.

Returns:

roc-auc scores for each class label

Return type:

list

class spacekit.analyzer.compute.ComputeRegressor(builder=None, algorithm='linreg', res_path='results/memory', show=False, validation=False, **log_kws)[source]

Bases: Computer

Computer subclass with additional methods specific to regression models.

Parameters:

Computer (parent class) – spacekit.analyzer.compute.Computer object

calculate_L2(subset=None)[source]

Calculate the L2 Normalization score of a regression model. L2 norm is the square root of the sum of the squared vector values (also known as the Euclidean norm or Euclidean distance from the origin). This metric is often used when fitting ML algorithms as a regularization method to keep the coefficients of the model small, i.e. to make the model less complex.

Returns:

L2 norm

Return type:

int

calculate_results()[source]

Main calling function to compute regression model scores, including residuals, root mean squared error and L2 cost function. Uses parent class method to save and/or load results to/from disk. Once calculated or loaded, other parent class methods can be used to generate various plots (e.g. resid_plot).

Returns:

ComputeRegressor object with calculated model evaluation metrics attributes.

Return type:

self

compute_preds()[source]

Get predictions (y_pred) based on regression model test inputs (X_test).

Returns:

predicted values for y (target)

Return type:

ndarray

compute_scores(error_stats=True)[source]

Calculate overall loss metrics of training and test sets. Default for regression is MSE (mean squared error) and RMSE (root MSE). RMSE is a measure of how spread out the residuals are (i.e. how concentrated the data is around the line of best fit). Note: RMSE is better in terms of reflecting performance when dealing with large error values (penalizes large errors) while MSE tends to be biased for high values.

Parameters:

error_stats (bool, optional) – Include RMSE and L2 norm for positive and negative groups of residuals in the test set (here “positive” means above the regression line (>0), “negative” means below (<0)). This can be useful when consequences might be more severe for underestimating vs. overestimating.

Returns:

model training loss scores (MSE and RMSE)

Return type:

dictionary

get_resid()[source]

Calculate residual error between ground truth (y_test) and prediction values of a regression model. Residuals are a measure of how far from the regression line the data points are.

Returns:

residual error values for a given test set

Return type:

list

load_results(outputs)[source]

Load previously calculated results/scores into Compute object (for comparing to other models and/or drawing plots).

Parameters:

outputs (dict) – dictionary of results (generated via make_outputs method above)

Returns:

spacekit.analyzer.compute.ComputeRegressor subclass object updated with results attributes

Return type:

self

make_outputs(dl=True)[source]

Create a dictionary of results calculated for a regression model. Used for saving results to disk.

Parameters:

dl (bool, optional) – download (save) to files on local disk, by default True

Returns:

outputs stored in a single dictionary for convenience

Return type:

dictionary

yhat_matrix()[source]

Compare ground-truth and prediction values of a regression model side-by-side. Used for calculating residuals (see get_resid method below).

Returns:

Concatenation of ground truth (y_test) and prediction (y_pred) arrays.

Return type:

ndarray