spacekit.analyzer.scan
- spacekit.analyzer.scan.decode_categorical(df, decoder_key)[source]
Add decoded column (using “{column}_key” suffix) to dataframe.
- Parameters:
df (pandas DataFrame) – dataframe with encoded categorical column
decoder_key (dict) – key-value pairs of encoding integers and strings
- Returns:
dataframe with additional categorical column (object dtype) decoded back to strings based on encoding pairs passed in decoder_key.
- Return type:
pandas DataFrane
decoder_key examples:
instrument_key = {"instr": {0: "acs", 1: "cos", 2: "stis", 3: "wfc3"}}
detector_key = {"det": {0: "hrc", 1: "ir", 2: "sbc", 3: "uvis", 4: "wfc"}}
- spacekit.analyzer.scan.import_dataset(filename=None, kwargs={'index_col': 'ipst'}, decoder_key=None)[source]
Imports and loads dataset from csv file. Optionally decodes an encoded feature back into strings.
- Parameters:
filename (str, optional) – path to dataframe csv file, by default None
kwargs (dict, optional) – keyword args to pass into pandas read_csv method, by default dict(index_col=”ipst”)
decoder_key (dict, optional) – nested dict of column and key value pairs for decoding a categorical feature into strings., by default None
- Returns:
dataframe loaded from csv file
- Return type:
Pandas DataFrame
- class spacekit.analyzer.scan.MegaScanner(perimeter='data/20??-*-*-*', primary=-1, name='MegaScanner', **log_kws)[source]
Bases:
object
Scans local disk for Compute object datasets and results files then loads them as attributes for use in plotting, EDA, and model evaluation.
- Parameters:
- acc_loss_subplots()[source]
Side by side grouped barplots of accuracy and loss metrics for multiple model training iterations.
- Returns:
plot figure traces and layout for side by side Accuracy and Loss grouped barplots
- Return type:
plotly.subplots object
- accuracy_bars()[source]
Barplots of training and test set accuracy scores loaded from a Pandas dataframe
- Returns:
Grouped barplot figure data of training and test set accuracy scores.
- Return type:
plotly.graph_objs.Figure
- compare_scores(metric='acc_loss')[source]
Create a dictionary of model scores for multiple training iterations. Score type depends on the type of model: classifiers typically use “acc_loss”; Regression models typically use “loss”.
- Parameters:
- Returns:
model evaluation metrics scores (accuracy/loss by default) for each model training iteration
- Return type:
Pandas dataframe
- load_compute_object(Com=<class 'spacekit.analyzer.compute.ComputeMulti'>, alg='clf', res_path='results', validation=False)[source]
Loads a single compute object of any type with results from one iteration.
- Parameters:
Com (spacekit.analyze.compute.Computer class, optional) – Compute subclass, by default ComputeMulti
alg (str, optional) – algorithm type, by default “clf”
res_path (str, optional) – path to results directory, by default “results”
validation (bool, optional) – validation data results (no training history), by default False
- Returns:
Results from the given path loaded as attributes into a Compute class object
- Return type:
spacekit.analyze.compute.Computer object
- loss_bars()[source]
Barplots of training and test set loss scores loaded from a Pandas dataframe
- Returns:
Grouped barplot figure data of training and test set loss scores.
- Return type:
plotly.graph_objs.Figure
- make_mega()[source]
Instantiate an empty nested dictionary of results files for each timestamp.
- Returns:
self.mega nested dictionary for storing results
- Return type:
- select_dataset(primary=None)[source]
Select which dataset file (if there are multiple timestamps) to use, e.g. for performing EDA.
- Parameters:
primary (int, optional) – index of primary dataset to use in sorted list of those found, by default None (-1 or most recent timestamp)
- Returns:
path to csv file of saved dataframe according to the primary index key of datasets found.
- Return type:
- Raises:
IndexError – primary index key must be a value between zero and the last index of the list of datasets.
- single_cmx(cmx, subtitles='v0', zmin=0.0, zmax=1.0, cmx_type='normalized')[source]
Confusion matrix plot for a single model training iteration
- Parameters:
cmx (2D numpy array) – confusion matrix
zmin (int or float) – typically 0 or 0.0 (minimum value for colorscale)
zmax (int) – typically 1 (if normalized) or 100 (max value for colorscale)
classes (list of strings) – target class labels
subtitles (tuple, optional) – text to place above each plot as a subtitle, by default (“v0”)
- Returns:
interactive confusion matrix plot
- Return type:
plotly figure factory annotated heatmap figure
- triple_cmx(cmx, cmx_type)[source]
Plot three confusion matrices side by side
- Parameters:
cmx_type (str) – “normalized” will return a normalized CMX (percentage of FNFPs), otherwise raw numeric values are displayed.
- Returns:
three interactive confusion matrices side by side as a subplot
- Return type:
plotly figure factory annotated heatmap subplots
- class spacekit.analyzer.scan.HstCalScanner(perimeter='data/20??-*-*-*', primary=-1, **log_kws)[source]
Bases:
MegaScanner
MegaScanner subclass for HST calibration model training iteration analysis
- Parameters:
MegaScanner (object) – Parent class object
- load_com_objects(dpath)[source]
Loads Multi classifier and Regression compute objects (3 total) for a single iteration of results
- scan_results()[source]
Scans local disk for Computer object-generated results files and stores them as new Compute objects (according to the model type) in a nested dictionary.
- Returns:
dictionary of model training results for each iteration found.
- Return type:
HstCalScanner.mega dictionary attribute
- class spacekit.analyzer.scan.HstSvmScanner(perimeter='data/20??-*-*-*', primary=-1, **log_kws)[source]
Bases:
MegaScanner
MegaScanner subclass for HST Single Visit Mosaic alignment model training iteration analysis
- Parameters:
MegaScanner (parent class object) – MegaScanner object
- load_com_objects(dpath)[source]
Load Binary classifier compute objects for a single iteration of test and validation results
- scan_results()[source]
Scans local disk for Computer object-generated results files and stores them as new Compute objects (according to the model type) in a nested dictionary.
- Returns:
dictionary of model training results for each iteration found.
- Return type:
HstSvmScanner.mega dictionary attribute