spacekit.skopes.hst.cal.predict

Spacekit HST Calibration Dataset Reprocessing Resource Prediction

Step 1: SCRAPE inputs from s3 text file (import data) Step 2: SCRUB inputs (preprocessing) Step 3: PREDICT resource requirements (inference)

Examples: df = run_preprocessing(“home/singlevisits”)

df = run_preprocessing(“home/syntheticdata”, fname=”synth2”, crpt=1, draw=0)

This module loads a pre-trained ANN to predict job resource requirements for HST. # 1 - load job metadata inputs from text file in s3 # 2 - encode strings as int/float values in numpy array # 3 - load models and generate predictions # 4 - return preds as json to parent lambda function

MEMORY BIN: classifier predicts which of 4 memory bins is most likely to be needed to process an HST dataset (ipppssoot) successfully. The probabilities of each bin are output to Cloudwatch logs and the highest bin probability is returned to the Calcloud job submit lambda invoking this one. Bin sizes are as follows:

Memory Bins: 0: < 2GB 1: 2-8GB 2: 8-16GB 3: >16GB

WALLCLOCK REGRESSION: regression generates estimate for specific number of seconds needed to process the dataset using the same input data. This number is then tripled in Calcloud for the sake of creating an extra buffer of overhead in order to prevent larger jobs from being killed unnecessarily.

MEMORY REGRESSION: A third regression model is used to estimate the actual value of memory needed for the job. This is mainly for the purpose of logging/future analysis and is not currently being used for allocating memory in calcloud jobs.

spacekit.skopes.hst.cal.predict.lambda_handler(event, context)[source]

Predict Resource Allocation requirements for memory (GB) and max execution kill time / wallclock (seconds) using three pre-trained neural networks. This lambda is invoked from the Job Submit lambda which json.dumps the s3 bucket and key to the file containing job input parameters. The path to the text file in s3 assumes the following format: control/ipppssoot/ ipppssoot_MemModelFeatures.txt.

spacekit.skopes.hst.cal.predict.local_handler(dataset, **kwargs)[source]

handles non-lambda invocations