spacekit.extractor.load
- spacekit.extractor.load.load_datasets(filenames, index_col='index', column_order=None, verbose=1)[source]
Import one or more dataframes from csv files and merge along the 0 axis (rows / horizontal). Assumes the datasets use the same index_col name and identical column names (although this is not strictly required) since this function does not handle missing data or NaNs.
- spacekit.extractor.load.stratified_splits(df, target='label', v=0.85)[source]
Splits Pandas dataframe into feature (X) and target (y) train, test and validation sets.
- Parameters:
df (Pandas dataframe) – preprocessed SVM regression test dataset
target (str, optional) – target class label for alignment model predictions, by default “label”
test_size (int, optional) – size of the test set, by default 0.2
val_size (int, optional) – create a validation set separate from train/test, by default 0.1
- Returns:
data, labels: features (X) and targets (y) split into train, test, validation sets
- Return type:
tuples of Pandas dataframes
- spacekit.extractor.load.read_channels(channels, w, h, d, exp=None, color_mode='rgb')[source]
Loads PNG image data and converts to 3D arrays.
- Parameters:
channels (tuple) – image frames (original, source, gaia)
w (int) – image width
h (int) – image height
d (int) – depth (number of image frames)
exp (int, optional) – expand array dimensions ie reshape to (exp, w, h, 3), by default None
color_mode (str, optional) – RGB (3 channel images) or grayscale (1 channel), by default “rgb”. SVM predictions requires exp=3; set to None for training.
- Returns:
image pixel values as array
- Return type:
numpy array
- class spacekit.extractor.load.ImageIO(img_path, format='png', data=None, name='ImageIO', **log_kws)[source]
Bases:
object
Parent Class for image file input/output operations
- check_format(format)[source]
Checks the format type of
img_path
(png
,jpg
ornpz
) and initializes theformat
attribute accordingly.
- load_multi_npz(i='img_index.npz', X='img_data.npz', y='img_labels.npz')[source]
Load numpy arrays from individual feature/image data, label and index compressed files on disk. As the counterpart function to
save_multi_npz
, keys within each file are expected to be named as follows: i: “train_idx”, “test_idx”, “val_idx” X: “X_train, “X_test”, “X_val” y: “y_train”, “y_test”, “y_val”
- load_npz(npz_file=None, keys=['index', 'images', 'labels'])[source]
_summary_
- Parameters:
- Returns:
If three keys are passed into the keyword arg
keys
, a tuple of 3 arrays matching these keys is returned. If only 2 keys are passed, returns 2 arrays matching the 2 keys.- Return type:
arrays or tuple of arrays
- split_arrays_from_npz(v=0.85)[source]
Loads images (X), labels (y) and index (i) from a single .npz compressed numpy file. Splits into train, test, val sets using 70-20-10 ratios.
- Returns:
train, test, val tuples of numpy arrays. Each tuple consists of an index, feature data (X, for images these are the actual pixel values) and labels (y).
- Return type:
tuples
- class spacekit.extractor.load.SVMImageIO(img_path, w=128, h=128, d=9, inference=True, format='png', data=None, target='label', v=0.85, **log_kws)[source]
Bases:
ImageIO
Subclass for loading Single Visit Mosaic total detection .png images from local disk into numpy arrays and performing initial preprocessing and labeling for training a CNN or generating predictions on unlabeled data.
- Parameters:
ImageIO (class) – ImageIO parent class
Instantiates an SVMImageIO object.
- Parameters:
img_path (string) – path to local directory containing png files
w (int, optional) – image pixel width, by default 128
h (int, optional) – image pixel height, by default 128
d (int, optional) – channel depth, by default 9
inference (bool, optional) – determines how to load images (set to False for training), by default True
format (str, optional) – format type of image file(s),
png
,jpg
ornpz
, by default “png”data (dataframe, optional) – used to load mlp data inputs and split into train/test/validation sets, by default None
target (str, optional) – name of the target column in dataframe, by default “label”
v (float, optional) – size ratio for validation set, by default 0.85
- detector_prediction_images(X_data, exp=3)[source]
Load image files from pngs into numpy arrays. Image arrays are reshaped into the appropriate dimensions for generating predictions in a pre-trained image CNN (no data augmentation is performed).
- Parameters:
X_data (Pandas dataframe) – input data (assumes index values are the image filenames)
exp (int, optional) – expand image array shape into its constituent frame dimensions, by default 3
- Returns:
image name index, arrays of image pixel values
- Return type:
Pandas Index, numpy array
- detector_training_images(X_data, exp=None)[source]
Load image files from class-labeled folders containing pngs into numpy arrays. Image arrays are not reshaped since this assumes data augmentation will be performed at training time.
- get_labeled_image_paths(i)[source]
Creates lists of negative and positive image filepaths, assuming the image files are in subdirectories named according to the class labels e.g. “0” and “1” (Similar to how Keras
flow_from_directory
works). Note: this method expects 3 images in the subdirectory, two of which have suffices _source and _gaia appended, and a very specific path format:{img_path}/{label}/{i}/{i}_{suffix}.png
wherei
is typically the full name of the visit. This may be made more flexible in future versions but for now is more or less hardcoded for SVM images generated byspacekit.skopes.hst.svm.prep
orcorrupt
modules.- Parameters:
i (str) – image filename
- Returns:
image filenames for each image type (original, source, gaia)
- Return type:
tuples
- load_from_data_splits(X_train, X_test, X_val)[source]
Read in train/test files and produce X-y data splits.
- Parameters:
X_train (numpy.ndarray) – training image inputs
X_test (numpy.ndarray) – test image inputs
X_val (numpy.ndarray) – validation image inputs
- Returns:
train, test, val nested lists each containing an index of the visit names and png image data as numpy arrays.
- Return type:
nested lists
- spacekit.extractor.load.save_dct_to_txt(data_dict)[source]
Saves the key-value pairs of a dictionary to text files on local disk, with each key as a filename and its value(s) as the contents of that file.