How it Works¶
The dockerfile is used to build an ECR image used by the training instance. The dockerfile contains the following important dependencies:
TensorFlow for GPU
Python 2 and 3
Coral retraining scripts
Images should be labelled in Supervisely. They should be downloaded as
jpeg + json, in a tar file. When the user calls
estimator.fit("s3://bucket"), SageMaker automatically downloads the content of that folder/bucket to
/opt/ml/input/data/training inside of the training instance.
The tar is converted to the 2 records and
.pbtxt used by the retraining script by the
tar_to_record.sh script. It automatically finds the ONLY tar in the specified folder and extracts it. It then uses
json_to_csv.py to convert the jsons to 2 large csv files.
generate_tfrecord.py converts the csv files into .record files. Finally, the
meta.json file is parsed by
parse_meta.py to create the
.pbtxt file, which is a label map.
At the moment, the only hyperparameter that you can change is the number of training steps. The dict specified in the notebook is written to
/opt/ml/input/config/hyperparameters.json in the training instance. It is parsed by
hyper.py, and is used when calling
./retrain_....sh in train.
estimator.fit(...) calls the train script inside the training instance. It downloads checkpoints, creates the records, trains, converts to .tflite, and uploads to S3.
output.tflite is moved to
/opt/ml/model/output.tflite. This is then automatically uploaded to an S3 bucket generated by SageMaker. You can find exactly where this is uploaded by going into the completed training job in SageMaker. It will be inside of a tar, inside of a tar.