spacekit.analyzer.compute
- class spacekit.analyzer.compute.Computer(algorithm, res_path=None, show=False, validation=False, name='Computer', **log_kws)[source]
Bases:
object
- acc_loss_scores()[source]
Calculate overall accuracy and loss metrics of training and test sets.
- Returns:
mean accuracy and loss scores of training and test sets (generated via Keras history)
- Return type:
dictionary
- builder_inputs(builder=None)[source]
Produces same result as
inputs
method, using a builder object’s attributes instead. Allows for automatic switch to validation set.- Parameters:
builder (spacekit.builder.networks.Builder object, optional) – Builder object used to train the model (instantiantes its own attributes from it), by default None
- Returns:
Computer object updated with Builder attributes
- Return type:
self
- download(outputs)[source]
Downloads model training results (
outputs
calculated by Computer obj) to local files for later retrieval and plotting/analysis.- Parameters:
outputs (dictionary) – Outputs created by their respective subclasses using the
make_outputs
method.
- draw_plots()[source]
Generate standard classification model plots (keras accuracy and loss, ROC-AUC curve, Precision-Recall curve, Confusion Matrix).
- Returns:
updated with standard plot attribute values
- Return type:
Computer object
- fusion_matrix(cm, classes, normalize=True, cmap='Blues')[source]
Confusion Matrix. Can pass in matrix or a tuple (ytrue,ypred) to create on the fly classes: class names for target variables
- Parameters:
cm (tuple or sklearn confusion_matrix object) – (y_test, y_pred) tuple or a confusion matrix of true and false positives and negatives.
classes (list) – class labels (strings) to show on the axes
normalize (bool, optional) – Show percentages instead of raw values, by default True
cmap (str, optional) – Colormap, by default “Blues”
- Returns:
confusion matrix figure with colorscale
- Return type:
matplotlib.pyplot Figure
- inputs(model, history, X_train, y_train, X_test, y_test, test_idx)[source]
Instantiates training vars as attributes. By default, a Computer object is instantiated without these - they are only needed for calculating and storing results which can then be retrieved by Computer separately (without training vars) from npz compressed files using the
upload()
method.- Parameters:
model (object) – Keras functional model
history (dict) – model training history
X_train (Pandas dataframe or Numpy array) – training feature data
y_train (Pandas dataframe or Numpy array) – training target data
X_test (Pandas dataframe or Numpy array) – test/validation feature data
y_test (Pandas dataframe or Numpy array) – test/validation target data
test_idx (Pandas series) – test data index and ground truth values (y_test)
- Returns:
updated with model attributes used for calculating results
- Return type:
Computer object (self)
- keras_acc_plot()[source]
Line plot of training and test accuracy scores per epoch
- Returns:
Keras history training and test set accuracy scores for each epoch
- Return type:
plotly.graph_obj Figure
- keras_loss_plot()[source]
Line plot of training and test loss scores per epoch
- Returns:
Keras history training and test set loss scores for each epoch
- Return type:
plotly.graph_obj Figure
- make_pr_curve()[source]
Plots the Precision-Recall Curve
- Returns:
Precision-Recall curve figure plot
- Return type:
plotly.graph_obj Figure
- make_roc_curve()[source]
Plots the Receiver-Operator Characteristic (Area Under the Curve).
- Returns:
ROC-AUC interactive figure plot
- Return type:
plotly.graph_obj Figure
- onehot_y(prefix='lab')[source]
Generates onehot-encoded dataframe of categorical target class values (for multiclassification models).
- Parameters:
prefix (str, optional) – abbreviated string prefix for target class name. Defaults to “lab” (abbr for “label”)., by default “lab”
- Returns:
one-hot encoded target class labels (dummies)
- Return type:
dataframe
- resid_plot()[source]
Plot the residual error for a regression model.
- Returns:
interactive scatter plot figure of residuals in the test set
- Return type:
plotly.graph_obj Figure
- roc_plots()[source]
Calculates ROC_AUC score and plots Receiver Operator Characteristics (ROC).
- Returns:
int – roc_auc_score (via sklearn)
Figure – receiver-operator characteristic area under the curve (ROC-AUC) plot
- class spacekit.analyzer.compute.ComputeClassifier(algorithm='clf', classes=['2g', '8g', '16g', '64g'], res_path='results/mem_bin', show=False, validation=False, name='ComputeClassifier', **log_kws)[source]
Bases:
Computer
Computer subclass with additional methods specific to classification models.
- Parameters:
Computer (Class object) – spacekit.analyzer.compute.Computer object
- load_results(outputs)[source]
Load a previously trained model’s results from local disk and store in a dictionary.
- Parameters:
outputs (dictionary) – outputs stored in a dictionary
- Returns:
spacekit.analyzer.compute.ComputeClassifier subclass object
- Return type:
self
- make_outputs(dl=True)[source]
Store computed results into a dictionary, and optionally save to disk.
- Parameters:
dl (bool, optional) – Download results (save as files on local disk), by default True
- Returns:
outputs stored in a dictionary
- Return type:
dictionary
- class spacekit.analyzer.compute.ComputeBinary(builder=None, algorithm='binary', classes=['aligned', 'misaligned'], res_path='results/svm', show=False, validation=False, **log_kws)[source]
Bases:
ComputeClassifier
ComputeClassifier subclass with additional methods specific to binary classification models.
- Parameters:
ComputeClassifier (Subclass object) – spacekit.analyzer.compute.ComputeClassifier object
- calculate_results(show_summary=True)[source]
Calculate metrics relevant to binary classification model training and assign to the appropriate subclass attributes.
- Parameters:
show_summary (bool, optional) – print the classification report and other summarized metrics to standard out, by default True
- Returns:
spacekit.analyzer.compute.ComputeBinary subclass object
- Return type:
self
- class spacekit.analyzer.compute.ComputeMulti(builder=None, algorithm='multiclass', classes=['2g', '8g', '16g', '64g'], res_path='results/mem_bin', show=False, validation=False, **log_kws)[source]
Bases:
ComputeClassifier
ComputeClassifier subclass with additional methods specific to multi-classification models.
- Parameters:
ComputeClassifier (Subclass object) – spacekit.analyzer.compute.ComputeClassifier object
- calculate_multi(show_summary=True)[source]
Calculate metrics relevant to multi-classification model training and assign to the appropriate subclass attributes.
- Parameters:
show_summary (bool, optional) – print the classification report and other summarized metrics to standard out, by default True
- Returns:
spacekit.analyzer.compute.ComputeMulti subclass object
- Return type:
self
- fnfp_multi()[source]
Determine index names of false negatives and false positives from the training inputs and store in a dictionary along with related prediction probabilities.
- Returns:
false-negative false-positive results
- Return type:
dictionary
- onehot_multi(prefix='bin')[source]
Generates onehot-encoded dataframe of categorical target class values (for multiclassification models).
- Parameters:
prefix (str, optional) – abbreviated string prefix for target class name, by default “bin”
- Returns:
one-hot encoded target class labels (dummies)
- Return type:
dataframe
- class spacekit.analyzer.compute.ComputeRegressor(builder=None, algorithm='linreg', res_path='results/memory', show=False, validation=False, **log_kws)[source]
Bases:
Computer
Computer subclass with additional methods specific to regression models.
- Parameters:
Computer (parent class) – spacekit.analyzer.compute.Computer object
- calculate_L2(subset=None)[source]
Calculate the L2 Normalization score of a regression model. L2 norm is the square root of the sum of the squared vector values (also known as the Euclidean norm or Euclidean distance from the origin). This metric is often used when fitting ML algorithms as a regularization method to keep the coefficients of the model small, i.e. to make the model less complex.
- Returns:
L2 norm
- Return type:
- calculate_results()[source]
Main calling function to compute regression model scores, including residuals, root mean squared error and L2 cost function. Uses parent class method to save and/or load results to/from disk. Once calculated or loaded, other parent class methods can be used to generate various plots (e.g.
resid_plot
).- Returns:
ComputeRegressor object with calculated model evaluation metrics attributes.
- Return type:
self
- compute_preds()[source]
Get predictions (
y_pred
) based on regression model test inputs (X_test
).- Returns:
predicted values for y (target)
- Return type:
ndarray
- compute_scores(error_stats=True)[source]
Calculate overall loss metrics of training and test sets. Default for regression is MSE (mean squared error) and RMSE (root MSE). RMSE is a measure of how spread out the residuals are (i.e. how concentrated the data is around the line of best fit). Note: RMSE is better in terms of reflecting performance when dealing with large error values (penalizes large errors) while MSE tends to be biased for high values.
- Parameters:
error_stats (bool, optional) – Include RMSE and L2 norm for positive and negative groups of residuals in the test set (here “positive” means above the regression line (>0), “negative” means below (<0)). This can be useful when consequences might be more severe for underestimating vs. overestimating.
- Returns:
model training loss scores (MSE and RMSE)
- Return type:
dictionary
- get_resid()[source]
Calculate residual error between ground truth (
y_test
) and prediction values of a regression model. Residuals are a measure of how far from the regression line the data points are.- Returns:
residual error values for a given test set
- Return type:
- load_results(outputs)[source]
Load previously calculated results/scores into Compute object (for comparing to other models and/or drawing plots).
- Parameters:
outputs (dict) – dictionary of results (generated via
make_outputs
method above)- Returns:
spacekit.analyzer.compute.ComputeRegressor subclass object updated with results attributes
- Return type:
self
- make_outputs(dl=True)[source]
Create a dictionary of results calculated for a regression model. Used for saving results to disk.
- Parameters:
dl (bool, optional) – download (save) to files on local disk, by default True
- Returns:
outputs stored in a single dictionary for convenience
- Return type:
dictionary