spacekit.analyzer.scan

spacekit.analyzer.scan.decode_categorical(df, decoder_key)[source]

Add decoded column (using “{column}_key” suffix) to dataframe.

Parameters:

df (pandas DataFrame) – dataframe with encoded categorical column
decoder_key (dict) – key-value pairs of encoding integers and strings

Returns:

dataframe with additional categorical column (object dtype) decoded back to strings based on encoding pairs passed in decoder_key.

Return type:

pandas DataFrane

decoder_key examples:

instrument_key = {"instr": {0: "acs", 1: "cos", 2: "stis", 3: "wfc3"}}
detector_key = {"det": {0: "hrc", 1: "ir", 2: "sbc", 3: "uvis", 4: "wfc"}}

spacekit.analyzer.scan.import_dataset(filename=None, kwargs={'index_col': 'ipst'}, decoder_key=None)[source]

Imports and loads dataset from csv file. Optionally decodes an encoded feature back into strings.

Parameters:

filename (str, optional) – path to dataframe csv file, by default None
kwargs (dict, optional) – keyword args to pass into pandas read_csv method, by default dict(index_col=”ipst”)
decoder_key (dict, optional) – nested dict of column and key value pairs for decoding a categorical feature into strings., by default None

Returns:

dataframe loaded from csv file

Return type:

Pandas DataFrame

class spacekit.analyzer.scan.MegaScanner(perimeter='data/20??-*-*-*', primary=-1, name='MegaScanner', **log_kws)[source]

Bases: object

Scans local disk for Compute object datasets and results files then loads them as attributes for use in plotting, EDA, and model evaluation.

Parameters:

perimeter (str, optional) – glob search pattern
primary (int, optional) – index of primary dataset to use for EDA in sorted list of those found, by default -1

acc_loss_subplots()[source]

Side by side grouped barplots of accuracy and loss metrics for multiple model training iterations.

Returns:: plot figure traces and layout for side by side Accuracy and Loss grouped barplots
Return type:: plotly.subplots object

accuracy_bars()[source]

Barplots of training and test set accuracy scores loaded from a Pandas dataframe

Returns:: Grouped barplot figure data of training and test set accuracy scores.
Return type:: plotly.graph_objs.Figure

compare_scores(metric='acc_loss')[source]

Create a dictionary of model scores for multiple training iterations. Score type depends on the type of model: classifiers typically use “acc_loss”; Regression models typically use “loss”.

Parameters:

target (str, optional) – y target class label, by default “mem_bin”
score_type (str, optional) – metric used by model (clf=acc_loss, reg=loss), by default “acc_loss”

Returns:

model evaluation metrics scores (accuracy/loss by default) for each model training iteration

Return type:

Pandas dataframe

load_compute_object(Com=<class 'spacekit.analyzer.compute.ComputeMulti'>, alg='clf', res_path='results', validation=False)[source]

Loads a single compute object of any type with results from one iteration.

Parameters:

Com (spacekit.analyze.compute.Computer class, optional) – Compute subclass, by default ComputeMulti
alg (str, optional) – algorithm type, by default “clf”
res_path (str, optional) – path to results directory, by default “results”
validation (bool, optional) – validation data results (no training history), by default False

Returns:

Results from the given path loaded as attributes into a Compute class object

Return type:

spacekit.analyze.compute.Computer object

load_dataframe()[source]

loss_bars()[source]

Barplots of training and test set loss scores loaded from a Pandas dataframe

Returns:: Grouped barplot figure data of training and test set loss scores.
Return type:: plotly.graph_objs.Figure

make_barplots(metric='acc_loss')[source]

make_clf_plots(target='mem_bin')[source]

make_mega()[source]

Instantiate an empty nested dictionary of results files for each timestamp.

Returns:: self.mega nested dictionary for storing results
Return type:: dict

select_dataset(primary=None)[source]

Select which dataset file (if there are multiple timestamps) to use, e.g. for performing EDA.

Parameters:: primary (int, optional) – index of primary dataset to use in sorted list of those found, by default None (-1 or most recent timestamp)
Returns:: path to csv file of saved dataframe according to the primary index key of datasets found.
Return type:: str
Raises:: IndexError – primary index key must be a value between zero and the last index of the list of datasets.

single_cmx(cmx, subtitles='v0', zmin=0.0, zmax=1.0, cmx_type='normalized')[source]

Confusion matrix plot for a single model training iteration

Parameters:

cmx (2D numpy array) – confusion matrix
zmin (int or float) – typically 0 or 0.0 (minimum value for colorscale)
zmax (int) – typically 1 (if normalized) or 100 (max value for colorscale)
classes (list of strings) – target class labels
subtitles (tuple, optional) – text to place above each plot as a subtitle, by default (“v0”)

Returns:

interactive confusion matrix plot

Return type:

plotly figure factory annotated heatmap figure

triple_cmx(cmx, cmx_type)[source]

Plot three confusion matrices side by side

Parameters:: cmx_type (str) – “normalized” will return a normalized CMX (percentage of FNFPs), otherwise raw numeric values are displayed.
Returns:: three interactive confusion matrices side by side as a subplot
Return type:: plotly figure factory annotated heatmap subplots

class spacekit.analyzer.scan.HstCalScanner(perimeter='data/20??-*-*-*', primary=-1, **log_kws)[source]

Bases: MegaScanner

MegaScanner subclass for HST calibration model training iteration analysis

Parameters:: MegaScanner (object) – Parent class object

load_com_objects(dpath)[source]

Loads Multi classifier and Regression compute objects (3 total) for a single iteration of results

Parameters:: dpath (str) – dataset subdirectory path, e.g. “data/2022-02-03/results”
Returns:: tuple of mem_bin, memory, wallclock compute objects for one iteration
Return type:: tuple

scan_results()[source]

Scans local disk for Computer object-generated results files and stores them as new Compute objects (according to the model type) in a nested dictionary.

Returns:: dictionary of model training results for each iteration found.
Return type:: HstCalScanner.mega dictionary attribute

class spacekit.analyzer.scan.HstSvmScanner(perimeter='data/20??-*-*-*', primary=-1, **log_kws)[source]

Bases: MegaScanner

MegaScanner subclass for HST Single Visit Mosaic alignment model training iteration analysis

Parameters:: MegaScanner (parent class object) – MegaScanner object

load_com_objects(dpath)[source]

Load Binary classifier compute objects for a single iteration of test and validation results

Parameters:: dpath (str) – dataset subdirectory path, e.g. “data/2022-02-03/results”
Returns:: tuple of test and validation compute objects for one iteration
Return type:: tuple

scan_results()[source]

Scans local disk for Computer object-generated results files and stores them as new Compute objects (according to the model type) in a nested dictionary.

Returns:: dictionary of model training results for each iteration found.
Return type:: HstSvmScanner.mega dictionary attribute