spacekit.preprocessor.encode
- class spacekit.preprocessor.encode.HstCalEncoder(data, fkeys=['DETECTOR', 'SUBARRAY', 'DRIZCORR', 'PCTECORR'], names=['detector', 'subarray', 'drizcorr', 'pctecorr'], keypair_file=None, encoding_pairs=None, **log_kws)[source]
Categorical encoding class for HST Calibration in the Cloud Reprocessing inputs.
Instantiates a CalEncoder class object.
- class spacekit.preprocessor.encode.HstSvmEncoder(data, fkeys=['category', 'detector', 'wcstype'], names=['cat', 'det', 'wcs'], drop=False, rename=False, keypair_file=None, encoding_pairs=None, **log_kws)[source]
Categorical encoding class for HST Single Visit Mosiac regression test data inputs.
Instantiates an HstSvmEncoder class object.
- Parameters:
- encode_categories(cname='category', sep=';')[source]
Transforms the raw string inputs from MAST target category naming conventions into an abbreviated form. For example,
CLUSTER OF GALAXIES;GRAVITATIONA
becomesGC
for galaxy cluster; andSTELLAR CLUSTER;GLOBULAR CLUSTER
becomesSC
for stellar cluster. This serves to group similar but differently named objects into a discrete set of 8 possible categorizations. The 8 categories will then be encoded into integer values in the final encoding step (machine learning inputs must be numeric).- Returns:
original dataframe with category input feature values encoded.
- Return type:
dataframe
- init_categories()[source]
Assigns abbreviated character code as key-pair value for each type of target category classification (as determined by data on MAST archive).
- Returns:
key-pair values for image target category classification.
- Return type:
- make_keypairs()[source]
Instantiates key-pair dictionaries for each of the categorical features listed in
fkeys
. Except for the target classification “category” feature, each string value is assigned an integer in alphabetical and increasing order, respectively. For the image target category feature, an integer is assigned to each abbreviated version of strings collected from the MAST archive). The extra abbreviation step is done to allow for debugging and analysis purposes (value-count of abbreviated versions are printed to stdout before the final encoding).- Returns:
key-pair values for image target category classification (category), detectors and wcstype.
- Return type:
- spacekit.preprocessor.encode.encode_target_data(y_train, y_test)[source]
Label encodes target class training and test data for multi-classification models.
- Parameters:
y_train (dataframe or ndarray) – training target data
y_test (dataframe or ndarray) – test target data
- Returns:
y_train, y_test
- Return type:
ndarrays