spacekit.preprocessor.encode

class spacekit.preprocessor.encode.HstCalEncoder(data, fkeys=['DETECTOR', 'SUBARRAY', 'DRIZCORR', 'PCTECORR'], names=['detector', 'subarray', 'drizcorr', 'pctecorr'], keypair_file=None, encoding_pairs=None, **log_kws)[source]

Categorical encoding class for HST Calibration in the Cloud Reprocessing inputs.

Instantiates a CalEncoder class object.

Parameters:
  • data (dataframe) – input data containing features (columns) to be encoded

  • fkeys (list) – categorical-type column names (str) to be encoded

  • names (list) – new names to assign columns of the encoded versions of categorical data

class spacekit.preprocessor.encode.HstSvmEncoder(data, fkeys=['category', 'detector', 'wcstype'], names=['cat', 'det', 'wcs'], drop=False, rename=False, keypair_file=None, encoding_pairs=None, **log_kws)[source]

Categorical encoding class for HST Single Visit Mosiac regression test data inputs.

Instantiates an HstSvmEncoder class object.

Parameters:
  • data (dataframe) – input data containing features (columns) to be encoded

  • fkeys (list) – categorical-type column names (str) to be encoded

  • names (list) – new names to assign columns of the encoded versions of categorical data

encode_categories(cname='category', sep=';')[source]

Transforms the raw string inputs from MAST target category naming conventions into an abbreviated form. For example, CLUSTER OF GALAXIES;GRAVITATIONA becomes GC for galaxy cluster; and STELLAR CLUSTER;GLOBULAR CLUSTER becomes SC for stellar cluster. This serves to group similar but differently named objects into a discrete set of 8 possible categorizations. The 8 categories will then be encoded into integer values in the final encoding step (machine learning inputs must be numeric).

Returns:

original dataframe with category input feature values encoded.

Return type:

dataframe

init_categories()[source]

Assigns abbreviated character code as key-pair value for each type of target category classification (as determined by data on MAST archive).

Returns:

key-pair values for image target category classification.

Return type:

dict

make_keypairs()[source]

Instantiates key-pair dictionaries for each of the categorical features listed in fkeys. Except for the target classification “category” feature, each string value is assigned an integer in alphabetical and increasing order, respectively. For the image target category feature, an integer is assigned to each abbreviated version of strings collected from the MAST archive). The extra abbreviation step is done to allow for debugging and analysis purposes (value-count of abbreviated versions are printed to stdout before the final encoding).

Returns:

key-pair values for image target category classification (category), detectors and wcstype.

Return type:

dict

spacekit.preprocessor.encode.encode_target_data(y_train, y_test)[source]

Label encodes target class training and test data for multi-classification models.

Parameters:
  • y_train (dataframe or ndarray) – training target data

  • y_test (dataframe or ndarray) – test target data

Returns:

y_train, y_test

Return type:

ndarrays