Implement mannequin versioning with Amazon Redshift ML

November 2, 2023

1

Amazon Redshift ML permits knowledge analysts, builders, and knowledge scientists to coach machine studying (ML) fashions utilizing SQL. In earlier posts, we demonstrated how you should use the automated mannequin coaching functionality of Redshift ML to coach classification and regression fashions. Redshift ML means that you can create a mannequin utilizing SQL and specify your algorithm, resembling XGBoost. You should utilize Redshift ML to automate knowledge preparation, preprocessing, and number of your downside sort (for extra data, check with Create, practice, and deploy machine studying fashions in Amazon Redshift utilizing SQL with Amazon Redshift ML). You too can convey a mannequin beforehand skilled in Amazon SageMaker into Amazon Redshift by way of Redshift ML for native inference. For native inference on fashions created in SageMaker, the ML mannequin sort should be supported by Redshift ML. Nevertheless, distant inference is on the market for mannequin varieties that aren’t natively out there in Redshift ML.

Over time, ML fashions develop previous, and even when nothing drastic occurs, small adjustments accumulate. Frequent explanation why ML fashions must be retrained or audited embody:

Information drift – As a result of your knowledge has modified over time, the prediction accuracy of your ML fashions could start to lower in comparison with the accuracy exhibited throughout testing
Idea drift – The ML algorithm that was initially used could should be modified on account of completely different enterprise environments and different altering wants

It’s possible you’ll have to refresh the mannequin frequently, automate the method, and reevaluate your mannequin’s improved accuracy. As of this writing, Amazon Redshift doesn’t help versioning of ML fashions. On this submit, we present how you should use the convey your personal mannequin (BYOM) performance of Redshift ML to implement versioning of Redshift ML fashions.

We use native inference to implement mannequin versioning as a part of operationalizing ML fashions. We assume that you’ve got a great understanding of your knowledge and the issue sort that’s most relevant on your use case, and have created and deployed fashions to manufacturing.

Answer overview

On this submit, we use Redshift ML to construct a regression mannequin that predicts the variety of folks which will use the town of Toronto’s bike sharing service at any given hour of a day. The mannequin accounts for varied elements, together with holidays and climate situations, and since we have to predict a numerical consequence, we used a regression mannequin. We use knowledge drift as a cause for retraining the mannequin, and use mannequin versioning as a part of the answer.

After a mannequin is validated and is getting used frequently for working predictions, you’ll be able to create variations of the fashions, which requires you to retrain the mannequin utilizing an up to date coaching set and presumably a distinct algorithm. Versioning serves two fundamental functions:

You may check with prior variations of a mannequin for troubleshooting or audit functions. This allows you to make sure that your mannequin nonetheless retains excessive accuracy earlier than switching to a more moderen mannequin model.
You may proceed to run inference queries on the present model of a mannequin throughout the mannequin coaching means of the brand new model.

On the time of this writing, Redshift ML doesn’t have native versioning capabilities, however you’ll be able to nonetheless obtain versioning by implementing a number of easy SQL strategies through the use of the BYOM functionality. BYOM was launched to help pre-trained SageMaker fashions to run your inference queries in Amazon Redshift. On this submit, we use the identical BYOM approach to create a model of an present mannequin constructed utilizing Redshift ML.

The next determine illustrates this workflow.

Within the following sections, we present you easy methods to can create a model from an present mannequin after which carry out mannequin retraining.

Stipulations

As a prerequisite for implementing the instance on this submit, that you must arrange a Redshift cluster or Amazon Redshift Serverless endpoint. For the preliminary steps to get began and arrange your atmosphere, check with Create, practice, and deploy machine studying fashions in Amazon Redshift utilizing SQL with Amazon Redshift ML.

We use the regression mannequin created within the submit Construct regression fashions with Amazon Redshift ML. We assume that it’s already been deployed and use this mannequin to create new variations and retrain the mannequin.

Create a model from the prevailing mannequin

Step one is to create a model of the prevailing mannequin (which implies saving developmental adjustments of the mannequin) so {that a} historical past is maintained and the mannequin is on the market for comparability in a while.

The next code is the generic format of the CREATE MODEL command syntax; within the subsequent step, you get the knowledge wanted to make use of this command to create a brand new model:

CREATE MODEL model_name
    FROM ('job_name' | 's3_path' )
    FUNCTION function_name ( data_type [, ...] )
    RETURNS data_type
    IAM_ROLE { default }
    [ SETTINGS (
      S3_BUCKET 'bucket', | --required
      KMS_KEY_ID 'kms_string') --optional
    ];

Subsequent, we gather and apply the enter parameters to the previous CREATE MODEL code to the mannequin. We’d like the job identify and the info sorts of the mannequin enter and output values. We gather these by working the present mannequin command on our present mannequin. Run the next command in Amazon Redshift Question Editor v2:

present mannequin predict_rental_count;

Be aware the values for AutoML Job Title, Operate Parameter Varieties, and the Goal Column (trip_count) from the mannequin output. We use these values within the CREATE MODEL command to create the model.

The next CREATE MODEL assertion creates a model of the present mannequin utilizing the values collected from our present mannequin command. We append the date (the instance format is YYYYMMDD) to the tip of the mannequin and performance names to trace when this new model was created.

CREATE MODEL predict_rental_count_20230706 
FROM 'redshiftml-20230706171639810624' 
FUNCTION predict_rental_count_20230706 (int4, int4, int4, int4, int4, int4, int4, numeric, numeric, int4)
RETURNS float8 
IAM_ROLE default
SETTINGS (
S3_BUCKET '<<your S3 Bucket>>');

This command could take couple of minutes to finish. When it’s full, run the next command:

present mannequin predict_rental_count_20230706;

We are able to observe the next within the output:

AutoML Job Title is identical as the unique model of the mannequin
Operate Title exhibits the brand new identify, as anticipated
Inference Kind exhibits Native, which designates that is BYOM with native inference

You may run inference queries utilizing each variations of the mannequin to validate the inference outputs.

The next screenshot exhibits the output of the mannequin inference utilizing the unique model.

The next screenshot exhibits the output of mannequin inference utilizing the model copy.

As you’ll be able to see, the inference outputs are the identical.

You will have now realized easy methods to create a model of a beforehand skilled Redshift ML mannequin.

Retrain your Redshift ML mannequin

After you create a model of an present mannequin, you’ll be able to retrain the prevailing mannequin by merely creating a brand new mannequin.

You may create and practice a brand new mannequin utilizing similar CREATE MODEL command however utilizing completely different enter parameters, datasets, or downside varieties as relevant. For this submit, we retrain the mannequin on newer datasets. We append _new to the mannequin identify so it’s much like the prevailing mannequin for identification functions.

Within the following code, we use the CREATE MODEL command with a brand new dataset out there within the training_data desk:

CREATE MODEL predict_rental_count_new
FROM training_data
TARGET trip_count
FUNCTION predict_rental_count_new
IAM_ROLE 'arn:aws:iam::<accountid>:position/RedshiftML'
PROBLEM_TYPE regression
OBJECTIVE 'mse'
SETTINGS (s3_bucket 'redshiftml-<your-account-id>',
          s3_garbage_collect off,
          max_runtime 5000);

Run the next command to verify the standing of the brand new mannequin:

present mannequin predict_rental_count_new;

Substitute the prevailing Redshift ML mannequin with the retrained mannequin

The final step is to interchange the prevailing mannequin with the retrained mannequin. We do that by dropping the unique model of the mannequin and recreating a mannequin utilizing the BYOM approach.

First, verify your retrained mannequin to make sure the MSE/RMSE scores are staying steady between mannequin coaching runs. To validate the fashions, you’ll be able to run inferences by every of the mannequin capabilities in your dataset and evaluate the outcomes. We use the inference queries supplied in Construct regression fashions with Amazon Redshift ML.

After validation, you’ll be able to substitute your mannequin.

Begin by gathering the main points of the predict_rental_count_new mannequin.

Be aware the AutoML Job Title worth, the Operate Parameter Varieties values, and the Goal Column identify within the mannequin output.

Substitute the unique mannequin by dropping the unique mannequin after which creating the mannequin with the unique mannequin and performance names to ensure the prevailing references to the mannequin and performance names work:

drop mannequin predict_rental_count;
CREATE MODEL predict_rental_count
FROM 'redshiftml-20230706171639810624' 
FUNCTION predict_rental_count(int4, int4, int4, int4, int4, int4, int4, numeric, numeric, int4)
RETURNS float8 
IAM_ROLE default
SETTINGS (
S3_BUCKET ’<<your S3 Bucket>>’);

The mannequin creation ought to full in a couple of minutes. You may verify the standing of the mannequin by working the next command:

present mannequin predict_rental_count;

When the mannequin standing is prepared, the newer model predict_rental_count of your present mannequin is on the market for inference and the unique model of the ML mannequin predict_rental_count_20230706 is on the market for reference if wanted.

Please check with this GitHub repository for pattern scripts to automate mannequin versioning.

Conclusion

On this submit, we confirmed how you should use the BYOM characteristic of Redshift ML to do mannequin versioning. This lets you have a historical past of your fashions to be able to evaluate mannequin scores over time, reply to audit requests, and run inferences whereas coaching a brand new mannequin.

For extra details about constructing completely different fashions with Redshift ML, check with Amazon Redshift ML.

Concerning the Authors

Rohit Bansal is an Analytics Specialist Options Architect at AWS. He makes a speciality of Amazon Redshift and works with clients to construct next-generation analytics options utilizing different AWS Analytics providers.

Phil Bates is a Senior Analytics Specialist Options Architect at AWS. He has greater than 25 years of expertise implementing large-scale knowledge warehouse options. He’s obsessed with serving to clients by their cloud journey and utilizing the ability of ML inside their knowledge warehouse.