How it Works

Dockerfile

The dockerfile is used to build an ECR image used by the training instance. The dockerfile contains the following important dependencies:

  • TensorFlow for GPU

  • Python 2 and 3

  • Coral retraining scripts

  • WPILib scripts

Data

Images should be labelled in Supervisely. They should be downloaded as jpeg + json, in a tar file. When the user calls estimator.fit("s3://bucket"), SageMaker automatically downloads the content of that folder/bucket to /opt/ml/input/data/training inside of the training instance.

The tar is converted to the 2 records and .pbtxt used by the retraining script by the tar_to_record.sh script. It automatically finds the ONLY tar in the specified folder and extracts it. It then uses json_to_csv.py to convert the jsons to 2 large csv files. generate_tfrecord.py converts the csv files into .record files. Finally, the meta.json file is parsed by parse_meta.py to create the .pbtxt file, which is a label map.

Hyperparameters

At the moment, the only hyperparameter that you can change is the number of training steps. The dict specified in the notebook is written to /opt/ml/input/config/hyperparameters.json in the training instance. It is parsed by hyper.py, and is used when calling ./retrain_....sh in train.

Training

estimator.fit(...) calls the train script inside the training instance. It downloads checkpoints, creates the records, trains, converts to .tflite, and uploads to S3.

Output

The output output.tflite is moved to /opt/ml/model/output.tflite. This is then automatically uploaded to an S3 bucket generated by SageMaker. You can find exactly where this is uploaded by going into the completed training job in SageMaker. It will be inside of a tar, inside of a tar.