spacekit.extractor.load

Inheritance diagram of spacekit.extractor.load

spacekit.extractor.load.load_datasets(filenames, index_col='index', column_order=None, verbose=1)[source]

Import one or more dataframes from csv files and merge along the 0 axis (rows / horizontal). Assumes the datasets use the same index_col name and identical column names (although this is not strictly required) since this function does not handle missing data or NaNs.

Parameters:
  • filenames (list) – path(s) to csv files of saved dataframes.

  • index_col (str, optional) – name of the index column to set

Returns:

Labeled dataframe loaded from csv file(s).

Return type:

DataFrame

spacekit.extractor.load.stratified_splits(df, target='label', v=0.85)[source]

Splits Pandas dataframe into feature (X) and target (y) train, test and validation sets.

Parameters:
  • df (Pandas dataframe) – preprocessed SVM regression test dataset

  • target (str, optional) – target class label for alignment model predictions, by default “label”

  • test_size (int, optional) – size of the test set, by default 0.2

  • val_size (int, optional) – create a validation set separate from train/test, by default 0.1

Returns:

data, labels: features (X) and targets (y) split into train, test, validation sets

Return type:

tuples of Pandas dataframes

spacekit.extractor.load.read_channels(channels, w, h, d, exp=None, color_mode='rgb')[source]

Loads PNG image data and converts to 3D arrays.

Parameters:
  • channels (tuple) – image frames (original, source, gaia)

  • w (int) – image width

  • h (int) – image height

  • d (int) – depth (number of image frames)

  • exp (int, optional) – expand array dimensions ie reshape to (exp, w, h, 3), by default None

  • color_mode (str, optional) – RGB (3 channel images) or grayscale (1 channel), by default “rgb”. SVM predictions requires exp=3; set to None for training.

Returns:

image pixel values as array

Return type:

numpy array

class spacekit.extractor.load.ImageIO(img_path, format='png', data=None, name='ImageIO', **log_kws)[source]

Bases: object

Parent Class for image file input/output operations

check_format(format)[source]

Checks the format type of img_path (png, jpg or npz) and initializes the format attribute accordingly.

Parameters:

format (str) – (png, jpg or npz)

Returns:

(png, jpg or npz)

Return type:

str

load_multi_npz(i='img_index.npz', X='img_data.npz', y='img_labels.npz')[source]

Load numpy arrays from individual feature/image data, label and index compressed files on disk. As the counterpart function to save_multi_npz, keys within each file are expected to be named as follows: i: “train_idx”, “test_idx”, “val_idx” X: “X_train, “X_test”, “X_val” y: “y_train”, “y_test”, “y_val”

Parameters:
  • i (str, optional) – image index filename, by default “img_index.npz”

  • X (str, optional) – image data filename, by default “img_data.npz”

  • y (str, optional) – image labels filename, by default “img_labels.npz”

Returns:

train, test, val tuples of arrays

Return type:

tuples of arrays

load_npz(npz_file=None, keys=['index', 'images', 'labels'])[source]

_summary_

Parameters:
  • npz_file (str, optional) – path-like string to the saved file if different from self.img_path, by default None

  • keys (list, optional) – keys identifying each array component, by default [“index”, “images”, “labels”]

Returns:

If three keys are passed into the keyword arg keys, a tuple of 3 arrays matching these keys is returned. If only 2 keys are passed, returns 2 arrays matching the 2 keys.

Return type:

arrays or tuple of arrays

save_multi_npz(train, test, val, data_path='data')[source]
save_npz(i, X, y, npz_file='data/img_data.npz')[source]

Store compressed data to disk

split_arrays(data, t=0.6, v=0.85)[source]

Split arrays into test and validation sample groups.

Parameters:
  • data (pd.DataFrame or np.array) – training data

  • t (float, optional) – test sample size as a fraction of 1, by default 0.6

  • v (float, optional) – validation sample size as a fraction of 1, by default 0.85

Returns:

split sampled arrays

Return type:

arrays

split_arrays_from_npz(v=0.85)[source]

Loads images (X), labels (y) and index (i) from a single .npz compressed numpy file. Splits into train, test, val sets using 70-20-10 ratios.

Returns:

train, test, val tuples of numpy arrays. Each tuple consists of an index, feature data (X, for images these are the actual pixel values) and labels (y).

Return type:

tuples

split_df_from_arrays(train, test, val, target='label')[source]
class spacekit.extractor.load.SVMImageIO(img_path, w=128, h=128, d=9, inference=True, format='png', data=None, target='label', v=0.85, **log_kws)[source]

Bases: ImageIO

Subclass for loading Single Visit Mosaic total detection .png images from local disk into numpy arrays and performing initial preprocessing and labeling for training a CNN or generating predictions on unlabeled data.

Parameters:

ImageIO (class) – ImageIO parent class

Instantiates an SVMImageIO object.

Parameters:
  • img_path (string) – path to local directory containing png files

  • w (int, optional) – image pixel width, by default 128

  • h (int, optional) – image pixel height, by default 128

  • d (int, optional) – channel depth, by default 9

  • inference (bool, optional) – determines how to load images (set to False for training), by default True

  • format (str, optional) – format type of image file(s), png, jpg or npz, by default “png”

  • data (dataframe, optional) – used to load mlp data inputs and split into train/test/validation sets, by default None

  • target (str, optional) – name of the target column in dataframe, by default “label”

  • v (float, optional) – size ratio for validation set, by default 0.85

detector_prediction_images(X_data, exp=3)[source]

Load image files from pngs into numpy arrays. Image arrays are reshaped into the appropriate dimensions for generating predictions in a pre-trained image CNN (no data augmentation is performed).

Parameters:
  • X_data (Pandas dataframe) – input data (assumes index values are the image filenames)

  • exp (int, optional) – expand image array shape into its constituent frame dimensions, by default 3

Returns:

image name index, arrays of image pixel values

Return type:

Pandas Index, numpy array

detector_training_images(X_data, exp=None)[source]

Load image files from class-labeled folders containing pngs into numpy arrays. Image arrays are not reshaped since this assumes data augmentation will be performed at training time.

Parameters:
  • X_data (Pandas dataframe) – input data (assumes index values are the image filenames)

  • exp (int, optional) – expand image array shape into its constituent frame dimensions, by default None

Returns:

index, image input array, image class labels: (idx, X, y)

Return type:

tuple

get_labeled_image_paths(i)[source]

Creates lists of negative and positive image filepaths, assuming the image files are in subdirectories named according to the class labels e.g. “0” and “1” (Similar to how Keras flow_from_directory works). Note: this method expects 3 images in the subdirectory, two of which have suffices _source and _gaia appended, and a very specific path format: {img_path}/{label}/{i}/{i}_{suffix}.png where i is typically the full name of the visit. This may be made more flexible in future versions but for now is more or less hardcoded for SVM images generated by spacekit.skopes.hst.svm.prep or corrupt modules.

Parameters:

i (str) – image filename

Returns:

image filenames for each image type (original, source, gaia)

Return type:

tuples

load()[source]
load_from_data_splits(X_train, X_test, X_val)[source]

Read in train/test files and produce X-y data splits.

Parameters:
Returns:

train, test, val nested lists each containing an index of the visit names and png image data as numpy arrays.

Return type:

nested lists

spacekit.extractor.load.save_dct_to_txt(data_dict)[source]

Saves the key-value pairs of a dictionary to text files on local disk, with each key as a filename and its value(s) as the contents of that file.

Parameters:

data_dict (dict) – dictionary containing keys as filenames and values as the contents to be saved to a text file.

Returns:

list of paths to each file saved to local disk.

Return type:

list

spacekit.extractor.load.save_dict(data_dict, df_key=None)[source]
spacekit.extractor.load.save_json(data, name)[source]
spacekit.extractor.load.save_dataframe(df, df_key, index_col='ipst')[source]
spacekit.extractor.load.zip_subdirs(top_path, zipname='models.zip')[source]
spacekit.extractor.load.is_within_directory(directory, target)[source]
spacekit.extractor.load.safe_extract(tar, fpath, expath='.', members=None, *, numeric_owner=False)[source]
spacekit.extractor.load.extract_file(fpath, dest='.')[source]
spacekit.extractor.load.save_multitype_data(data_dict, output_path, **npz_kwargs)[source]
spacekit.extractor.load.load_multitype_data(input_path, index_names=['index', 'ipst'])[source]