spacekit.analyzer.scan

spacekit.analyzer.scan.decode_categorical(df, decoder_key)[source]

Add decoded column (using “{column}_key” suffix) to dataframe.

Parameters:
  • df (pandas DataFrame) – dataframe with encoded categorical column

  • decoder_key (dict) – key-value pairs of encoding integers and strings

Returns:

dataframe with additional categorical column (object dtype) decoded back to strings based on encoding pairs passed in decoder_key.

Return type:

pandas DataFrane

decoder_key examples:

instrument_key = {"instr": {0: "acs", 1: "cos", 2: "stis", 3: "wfc3"}}
detector_key = {"det": {0: "hrc", 1: "ir", 2: "sbc", 3: "uvis", 4: "wfc"}}
spacekit.analyzer.scan.import_dataset(filename=None, kwargs={'index_col': 'ipst'}, decoder_key=None)[source]

Imports and loads dataset from csv file. Optionally decodes an encoded feature back into strings.

Parameters:
  • filename (str, optional) – path to dataframe csv file, by default None

  • kwargs (dict, optional) – keyword args to pass into pandas read_csv method, by default dict(index_col=”ipst”)

  • decoder_key (dict, optional) – nested dict of column and key value pairs for decoding a categorical feature into strings., by default None

Returns:

dataframe loaded from csv file

Return type:

Pandas DataFrame

class spacekit.analyzer.scan.MegaScanner(perimeter='data/20??-*-*-*', primary=-1, name='MegaScanner', **log_kws)[source]

Bases: object

Scans local disk for Compute object datasets and results files then loads them as attributes for use in plotting, EDA, and model evaluation.

Parameters:
  • perimeter (str, optional) – glob search pattern

  • primary (int, optional) – index of primary dataset to use for EDA in sorted list of those found, by default -1

acc_loss_subplots()[source]

Side by side grouped barplots of accuracy and loss metrics for multiple model training iterations.

Returns:

plot figure traces and layout for side by side Accuracy and Loss grouped barplots

Return type:

plotly.subplots object

accuracy_bars()[source]

Barplots of training and test set accuracy scores loaded from a Pandas dataframe

Returns:

Grouped barplot figure data of training and test set accuracy scores.

Return type:

plotly.graph_objs.Figure

compare_scores(metric='acc_loss')[source]

Create a dictionary of model scores for multiple training iterations. Score type depends on the type of model: classifiers typically use “acc_loss”; Regression models typically use “loss”.

Parameters:
  • target (str, optional) – y target class label, by default “mem_bin”

  • score_type (str, optional) – metric used by model (clf=acc_loss, reg=loss), by default “acc_loss”

Returns:

model evaluation metrics scores (accuracy/loss by default) for each model training iteration

Return type:

Pandas dataframe

load_compute_object(Com=<class 'spacekit.analyzer.compute.ComputeMulti'>, alg='clf', res_path='results', validation=False)[source]

Loads a single compute object of any type with results from one iteration.

Parameters:
  • Com (spacekit.analyze.compute.Computer class, optional) – Compute subclass, by default ComputeMulti

  • alg (str, optional) – algorithm type, by default “clf”

  • res_path (str, optional) – path to results directory, by default “results”

  • validation (bool, optional) – validation data results (no training history), by default False

Returns:

Results from the given path loaded as attributes into a Compute class object

Return type:

spacekit.analyze.compute.Computer object

load_dataframe()[source]
loss_bars()[source]

Barplots of training and test set loss scores loaded from a Pandas dataframe

Returns:

Grouped barplot figure data of training and test set loss scores.

Return type:

plotly.graph_objs.Figure

make_barplots(metric='acc_loss')[source]
make_clf_plots(target='mem_bin')[source]
make_mega()[source]

Instantiate an empty nested dictionary of results files for each timestamp.

Returns:

self.mega nested dictionary for storing results

Return type:

dict

select_dataset(primary=None)[source]

Select which dataset file (if there are multiple timestamps) to use, e.g. for performing EDA.

Parameters:

primary (int, optional) – index of primary dataset to use in sorted list of those found, by default None (-1 or most recent timestamp)

Returns:

path to csv file of saved dataframe according to the primary index key of datasets found.

Return type:

str

Raises:

IndexError – primary index key must be a value between zero and the last index of the list of datasets.

single_cmx(cmx, subtitles='v0', zmin=0.0, zmax=1.0, cmx_type='normalized')[source]

Confusion matrix plot for a single model training iteration

Parameters:
  • cmx (2D numpy array) – confusion matrix

  • zmin (int or float) – typically 0 or 0.0 (minimum value for colorscale)

  • zmax (int) – typically 1 (if normalized) or 100 (max value for colorscale)

  • classes (list of strings) – target class labels

  • subtitles (tuple, optional) – text to place above each plot as a subtitle, by default (“v0”)

Returns:

interactive confusion matrix plot

Return type:

plotly figure factory annotated heatmap figure

triple_cmx(cmx, cmx_type)[source]

Plot three confusion matrices side by side

Parameters:

cmx_type (str) – “normalized” will return a normalized CMX (percentage of FNFPs), otherwise raw numeric values are displayed.

Returns:

three interactive confusion matrices side by side as a subplot

Return type:

plotly figure factory annotated heatmap subplots

class spacekit.analyzer.scan.HstCalScanner(perimeter='data/20??-*-*-*', primary=-1, **log_kws)[source]

Bases: MegaScanner

MegaScanner subclass for HST calibration model training iteration analysis

Parameters:

MegaScanner (object) – Parent class object

load_com_objects(dpath)[source]

Loads Multi classifier and Regression compute objects (3 total) for a single iteration of results

Parameters:

dpath (str) – dataset subdirectory path, e.g. “data/2022-02-03/results”

Returns:

tuple of mem_bin, memory, wallclock compute objects for one iteration

Return type:

tuple

scan_results()[source]

Scans local disk for Computer object-generated results files and stores them as new Compute objects (according to the model type) in a nested dictionary.

Returns:

dictionary of model training results for each iteration found.

Return type:

HstCalScanner.mega dictionary attribute

class spacekit.analyzer.scan.HstSvmScanner(perimeter='data/20??-*-*-*', primary=-1, **log_kws)[source]

Bases: MegaScanner

MegaScanner subclass for HST Single Visit Mosaic alignment model training iteration analysis

Parameters:

MegaScanner (parent class object) – MegaScanner object

load_com_objects(dpath)[source]

Load Binary classifier compute objects for a single iteration of test and validation results

Parameters:

dpath (str) – dataset subdirectory path, e.g. “data/2022-02-03/results”

Returns:

tuple of test and validation compute objects for one iteration

Return type:

tuple

scan_results()[source]

Scans local disk for Computer object-generated results files and stores them as new Compute objects (according to the model type) in a nested dictionary.

Returns:

dictionary of model training results for each iteration found.

Return type:

HstSvmScanner.mega dictionary attribute