1. Introduction
Within the first half of this weblog collection, we mentioned learn how to use Intel®’s OpenVINO™ toolkit to speed up inference of the Massive Switch (BiT) mannequin for pc imaginative and prescient duties. We lined the method of importing the BiT mannequin into the OpenVINO atmosphere, leveraging {hardware} optimizations, and benchmarking efficiency. Our outcomes showcased vital efficiency positive aspects and decreased inference latency for BiT when utilizing OpenVINO in comparison with the unique TensorFlow implementation. With this sturdy base lead to place, there’s nonetheless room for additional optimization. On this second half, we’ll additional improve BiT mannequin inference with the assistance of OpenVINO and Neural Community Compression Framework (NNCF) and low precision (INT8) inference. NNCF offers refined instruments for neural community compression by quantization, pruning, and sparsity methods tailor-made for deep studying inference. This permits BiT fashions to grow to be viable for energy and memory-constrained environments the place the unique mannequin measurement could also be prohibitive. The methods introduced might be relevant to many deep studying fashions past BiT.
2. Mannequin Quantization
Mannequin quantization is an optimization approach that reduces the precision of weights and activations in a neural community. It converts 32-bit floating level representations (FP32) to decrease bit-widths like 16-bit floats (FP16) or 8-bit integers (INT8) or 4-bit integers (INT4). The important thing profit is enhanced effectivity — smaller mannequin measurement and quicker inference time. These enhancements not solely enhance effectivity on server platforms however, extra importantly, additionally allow deployment onto resource-constrained edge gadgets. So, whereas server platform efficiency is improved, the larger affect is opening all-new deployment alternatives. Quantization transforms fashions from being restricted to information facilities to being deployable even on low-power gadgets with restricted compute or reminiscence. This massively expands the attain of AI to the true edge.
Beneath are a number of of the important thing mannequin quantization ideas:
- Precision discount — Decreases the variety of bits used to characterize weights and activations. Frequent bit-widths: INT8, FP16. Permits smaller fashions.
- Effectivity — Compressed fashions are smaller and quicker, resulting in environment friendly system useful resource utilization.
- Commerce-offs — Balancing mannequin compression, velocity, and accuracy for goal {hardware}. The aim is to optimize throughout all fronts.
- Strategies — Put up-training and quantization-aware coaching. Bakes in resilience to decrease precision.
- Schemes — Quantization methods like weight, activation, or mixed strategies strike a stability between compressing fashions and preserving accuracy.
- Preserving accuracy — Tremendous-tuning, calibration, and retraining preserve the standard of real-world information.
3. Neural Community Compression Framework (NNCF)
NNCF is a strong software for optimizing deep studying fashions, such because the Massive Switch (BiT) mannequin, to realize improved efficiency on numerous {hardware}, starting from edge to information middle. It offers a complete set of options and capabilities for mannequin optimization, making it simple for builders to optimize fashions for low-precision inference. A few of the key capabilities embody:
- Help for quite a lot of post-training and training-time algorithms with minimal accuracy drop.
- Seamless mixture of pruning, sparsity, and quantization algorithms.
- Help for quite a lot of fashions: NNCF can be utilized to optimize fashions from quite a lot of frameworks, together with TensorFlow, PyTorch, ONNX, and OpenVINO.
NNCF offers samples that display the utilization of compression algorithms for various use circumstances and fashions. See compression outcomes achievable with the NNCF-powered samples on the Mannequin Zoo web page. For extra particulars consult with this GitHub repo.
4. BiT Classification Mannequin Optimization with OpenVINO™
Notice: Earlier than continuing with the next steps, guarantee you’ve a conda atmosphere arrange. Discuss with this weblog submit for detailed directions on establishing the conda atmosphere.
4.1. Obtain BiT_M_R50x1_1 tf classification mannequin:
wget https://tfhub.dev/google/bit/m-r50x1/1?tf-hub-format=compressed
-O bit_m_r50x1_1.tar.gzmkdir -p bit_m_r50x1_1 && tar -xvf bit_m_r50x1_1.tar.gz -C bit_m_r50x1_1
4.2. OpenVINO™ Mannequin Optimization:
Execute the beneath command contained in the conda atmosphere to generate OpenVINO IR mannequin recordsdata (.xml and .bin) for the bit_m_r50x1_1
mannequin. These mannequin recordsdata might be used for additional optimization and for inference accuracy validation in subsequent sections.
ovc ./bit_m_r50x1_1 --output_model ./bit_m_r50x1_1/ov/fp32/bit_m_r50x1_1
--compress_to_fp16 False
5. Knowledge Preparation
To judge the accuracy affect of quantization on our BiT mannequin, we want an appropriate dataset. For this, we leverage the ImageNet 2012 validation set which incorporates 50,000 pictures throughout 1000 lessons. The ILSVRC2012 validation floor reality is used for cross-referencing mannequin predictions throughout accuracy measurement.
By testing our compressed fashions on established information like ImageNet validation information, we will higher perceive the real-world utility of our optimizations. Sustaining maximal accuracy whereas minimizing useful resource utilization is essential for edge deployment. This dataset offers the rigorous and unbiased means to successfully validate these trade-offs.
Notice: Accessing and downloading the ImageNet dataset requires registration.
6. Quantization Utilizing NNCF
On this part, we’ll delve into the precise steps concerned in quantizing the BiT mannequin utilizing NNCF. The quantization course of entails getting ready a calibration dataset and making use of 8-bit quantization to the mannequin, adopted by accuracy analysis.
6.1. Getting ready Calibration Dataset:
At this step, create an occasion of the nncf.Dataset
class that represents the calibration dataset. The nncf.Dataset
class could be a wrapper over the framework dataset object used for mannequin coaching or validation. Beneath is a pattern code snippet of nncf.Dataset()
name with reworked information samples.
# TF Dataset break up for nncf calibration
img2012_val_split = get_val_data_split(tf_dataset_,
train_split=0.7,
val_split=0.3,
shuffle=True,
shuffle_size=50000)img2012_val_split = img2012_val_split.map(nncf_transform).batch(BATCH_SIZE)
calibration_dataset = nncf.Dataset(img2012_val_split)
The transformation operate is a operate that takes a pattern from the dataset and returns information that may be handed to the mannequin for inference. Beneath is the code snippet of the info rework.
# Knowledge rework operate for NNCF calibration
def nncf_transform(picture, label):
picture = tf.io.decode_jpeg(tf.io.read_file(picture), channels=3)
picture = tf.picture.resize(picture, IMG_SIZE)
return picture
6.2. NNCF Quantization (FP32 to INT8):
As soon as the calibration dataset is ready and the mannequin object is instantiated, the subsequent step entails making use of 8-bit quantization to the mannequin. That is achieved through the use of the nncf.quantize()
API, which takes the OpenVINO FP32 mannequin generated within the earlier steps together with the calibrated dataset values to provoke the quantization course of. Whereas nncf.quantize()
offers quite a few superior configuration knobs, in lots of circumstances like this one, it simply works out of the field or with minor changes. Beneath, is pattern code snippet of nncf.quantize()
API name.
ov_quantized_model = nncf.quantize(ov_model,
calibration_dataset,
fast_bias_correction=False)
For additional particulars, the official documentation offers a complete information on the essential quantization stream, together with establishing the atmosphere, getting ready the calibration dataset, and calling the quantization API to use 8-bit quantization to the mannequin.
6.3. Accuracy Analysis
Because of NNCF mannequin quantization course of, the OpenVINO INT8 quantized mannequin is generated. To judge the affect of quantization on mannequin accuracy, we carry out a complete benchmarking comparability between the unique FP32 mannequin and the quantized INT8 mannequin. This comparability entails measuring the accuracy of BiT Mannequin (m-r50x1/1) on the ImageNet 2012 Validation dataset. The accuracy analysis outcomes are proven in Desk 1.
Desk 1: Classification accuracy of BiT_m_r50x1_1 mannequin on the ImageNet 2012 Validation dataset.
With TensorFlow (FP32) to OpenVINO™ (FP32) mannequin optimization, the classification accuracy remained constant at 0.70154, confirming that conversion to OpenVINO™ mannequin illustration doesn’t have an effect on accuracy. Moreover, with NNCF Quantization to an 8-bit integer mannequin, the accuracy was solely marginally impacted of lower than 0.03%, demonstrating that the quantization course of didn’t compromise the mannequin’s classification skills.
Discuss with Appendix A for the Python script bit_ov_model_quantization.py
, which incorporates information preparation, mannequin optimization, NNCF quantization duties, and accuracy analysis.
The utilization of the bit_ov_model_quantization.py
script is as follows:
$python bit_ov_model_quantization.py --help
utilization: bit_ov_model_quantization.py [-h] [--inp_shape INP_SHAPE] --dataset_dir DATASET_DIR --gt_labels GT_LABELS --bit_m_tf BIT_M_TF --bit_ov_fp32 BIT_OV_FP32
[--bit_ov_int8 BIT_OV_INT8]BiT Classification mannequin quantization and accuracy measurement
required arguments:
--dataset_dir DATASET_DIR
Listing path to ImageNet2012 validation dataset
--gt_labels GT_LABELS
Path to ImageNet2012 validation ds gt labels file
--bit_m_tf BIT_M_TF Path to BiT TF fp32 mannequin file
--bit_ov_fp32 BIT_OV_FP32
Path to BiT OpenVINO fp32 mannequin file
non-obligatory arguments:
-h, --help present this assist message and exit
--inp_shape INP_SHAPE
N,W,H,C
--bit_ov_int8 BIT_OV_INT8
Path to save lots of BiT OpenVINO INT8 mannequin file
7. Conclusion
The outcomes emphasize the efficacy of OpenVINO™ and NNCF in optimizing mannequin effectivity whereas minimizing computational necessities. The power to realize exceptional efficiency and accuracy retention, significantly when compressing fashions to INT8 precision, demonstrates the practicality of leveraging OpenVINO™ for deployment in numerous environments together with resource-constrained environments. NNCF proves to be a useful software for practitioners searching for to stability mannequin measurement and computational effectivity with out substantial compromise on classification accuracy, opening avenues for enhanced mannequin deployment throughout various {hardware} configurations.
Notices & Disclaimers
Efficiency varies by use, configuration, and different elements. Study extra on the Efficiency Index website.
Efficiency outcomes are based mostly on testing as of dates proven in configurations and should not mirror all publicly out there updates. See backup for configuration particulars.
No product or element might be completely safe.
Your prices and outcomes might differ.
Intel applied sciences might require enabled {hardware}, software program or service activation.
© Intel Company. Intel, the Intel emblem, and different Intel marks are logos of Intel Company or its subsidiaries. Different names and types could also be claimed because the property of others.
Extra Assets
Appendix A
- ILSVRC2012 floor reality: ground_truth_ilsvrc2012_val.txt
- See
bit_ov_model_quantization.py
beneath for the BiT mannequin quantization pipeline with NNCF described on this weblog.
"""
Copyright (c) 2022 Intel Company Licensed below the Apache License, Model 2.0 (the "License");
chances are you'll not use this file besides in compliance with the License.
You might acquire a duplicate of the License at
http://www.apache.org/licenses/LICENSE-2.0
Until required by relevant legislation or agreed to in writing, software program
distributed below the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, both specific or implied.
See the License for the precise language governing permissions and
limitations below the License.
"""
"""
This script is examined with TensorFlow v2.12.1 and OpenVINO v2023.1.0
Utilization Instance beneath (with required parameters):
python bit_ov_model_quantization.py
--gt_labels ./<path_to>/ground_truth_ilsvrc2012_val.txt
--dataset_dir ./<path-to-dataset>/ilsvrc2012_val_ds/
--bit_m_tf ./<path-to-tf>/mannequin
--bit_ov_fp32 ./<path-to-ov>/fp32_ir_model
"""
import os, sys
from openvino.runtime import Core
import numpy as np
import argparse, os
import nncf
import openvino.runtime as ov
import pandas as pd
import re
import logging
logging.basicConfig(degree=logging.ERROR)
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow.compat.v2 as tf
from PIL import Picture
from sklearn.metrics import accuracy_score
ie = Core()
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# For high 1 labels.
MAX_PREDS = 1
BATCH_SIZE = 1
IMG_SIZE = (224, 224) # Default Imagenet picture measurement
NUM_CLASSES = 1000 # For Imagenette dataset
# Knowledge rework operate for NNCF calibration
def nncf_transform(picture, label):
picture = tf.io.decode_jpeg(tf.io.read_file(picture), channels=3)
picture = tf.picture.resize(picture, IMG_SIZE)
return picture
# Knowledge rework operate for imagenet ds validation
def val_transform(image_path, label):
picture = tf.io.decode_jpeg(tf.io.read_file(image_path), channels=3)
picture = tf.picture.resize(picture, IMG_SIZE)
img_reshaped = tf.reshape(picture, [IMG_SIZE[0], IMG_SIZE[1], 3])
picture = tf.picture.convert_image_dtype(img_reshaped, tf.float32)
return picture, label
# Validation dataset break up
def get_val_data_split(tf_dataset_, train_split=0.7, val_split=0.3,
shuffle=True, shuffle_size=50000):
if shuffle:
ds = tf_dataset_.shuffle(shuffle_size, seed=12)
train_size = int(train_split * shuffle_size)
val_size = int(val_split * shuffle_size)
val_ds = ds.skip(train_size).take(val_size)
return val_ds
# OpenVINO IR mannequin inference validation
def ov_infer_validate(mannequin: ov.Mannequin,
val_loader: tf.information.Dataset) -> tf.Tensor:
mannequin.reshape([1,IMG_SIZE[0],IMG_SIZE[1],3]) # If MO ran with Dynamic batching
compiled_model = ov.compile_model(mannequin)
output = compiled_model.outputs[0]
ov_predictions = []
for img, label in val_loader:#.take(25000):#.take(5000):#.take(5):
pred = compiled_model(img)[output]
ov_result = tf.reshape(pred, [-1])
top_label_idx = np.argsort(ov_result)[-MAX_PREDS::][::-1]
ov_predictions.append(top_label_idx)
return ov_predictions
# OpenVINO IR mannequin NNCF Quantization
def quantize(ov_model, calibration_dataset): #, val_loader: tf.information.Dataset):
print("Began NNCF qunatization course of")
ov_quantized_model = nncf.quantize(ov_model, calibration_dataset, fast_bias_correction=False)
return ov_quantized_model
# OpenVINO FP32 IR mannequin inference
def ov_fp32_predictions(ov_fp32_model, validation_dataset):
# Load and compile the OV mannequin
ov_model = ie.read_model(ov_fp32_model)
print("Beginning OV FP32 Mannequin Inference...!!!")
ov_fp32_pred = ov_infer_validate(ov_model, validation_dataset)
return ov_fp32_pred
def nncf_quantize_int8_pred_results(ov_fp32_model, calibration_dataset,
validation_dataset, ov_int8_model):
# Load and compile the OV mannequin
ov_model = ie.read_model(ov_fp32_model)
# NNCF Quantization of OpenVINO IR mannequin
int8_ov_model = quantize(ov_model, calibration_dataset)
ov.serialize(int8_ov_model, ov_int8_model)
print("NNCF Quantization Course of accomplished..!!!")
ov_int8_model = ie.read_model(ov_int8_model)
print("Beginning OV INT8 Mannequin Inference...!!!")
ov_int8_pred = ov_infer_validate(ov_int8_model, validation_dataset)
return ov_int8_pred
def tf_inference(tf_saved_model_path, val_loader: tf.information.Dataset):
tf_model = tf.keras.fashions.load_model(tf_saved_model_path)
print("Beginning TF FP32 Mannequin Inference...!!!")
tf_predictions = []
for img, label in val_loader:
tf_result = tf_model.predict(img, verbose=0)
tf_result = tf.reshape(tf_result, [-1])
top5_label_idx = np.argsort(tf_result)[-MAX_PREDS::][::-1]
tf_predictions.append(top5_label_idx)
return tf_predictions
"""
Module: bit_classificaiton
Description: API to run BiT classificaiton OpenVINO IR mannequin INT8 Quantization on utilizing NNCF and
perfom accuracy metrics for TF FP32, OV FP32 and OV INT8 on ImageNet2012 Validation dataset
"""
def bit_classification(args):
ip_shape = args.inp_shape
if isinstance(ip_shape, str):
ip_shape = [int(i) for i in ip_shape.split(",")]
if len(ip_shape) != 4:
sys.exit( "Enter form error. Set form 'N,W,H,C'. For instance: '1,224,224,3' " )
# Imagenet2012 validataion dataset used for TF and OV FP32 accuracy testing.
#dataset_dir = ../dataset/ilsvrc2012_val/1.0/ + "*.JPEG"
dataset_dir = args.dataset_dir + "*.JPEG"
tf_dataset = tf.information.Dataset.list_files(dataset_dir)
gt_lables = open(args.gt_labels)
val_labels = []
for l in gt_lables:
val_labels.append(str(l))
# Producing ImageNet 2012 validation dataset dictionary (img, label)
val_images = []
val_labels_in_img_order = []
for i, v in enumerate(tf_dataset):
img_path = str(v.numpy())
id = int(img_path.break up('/')[-1].break up('_')[-1].break up('.')[0])
val_images.append(img_path[2:-1])
val_labels_in_img_order.append(int(re.sub(r'n','',val_labels[id-1])))
val_df = pd.DataFrame(information={'pictures': val_images, 'label': val_labels_in_img_order})
# Changing imagenet2012 val dictionary into tf.information.Dataset
tf_dataset_ = tf.information.Dataset.from_tensor_slices((checklist(val_df['images'].values), val_df['label'].values))
imgnet2012_val_dataset = tf_dataset_.map(val_transform).batch(BATCH_SIZE)
# TF Dataset break up for nncf calibration
img2012_val_split_for_calib = get_val_data_split(tf_dataset_, train_split=0.7,
val_split=0.3, shuffle=True,
shuffle_size=50000)
img2012_val_split_for_calib = img2012_val_split_for_calib.map(nncf_transform).batch(BATCH_SIZE)
# TF Mannequin Inference
tf_model_path = args.bit_m_tf
print(f"Tensorflow FP32 Mannequin {args.bit_m_tf}")
tf_p = tf_inference(tf_model_path, imgnet2012_val_dataset)
#acc_score = accuracy_score(tf_pred, val_labels_in_img_order[0:25000])
acc_score = accuracy_score(tf_p, val_labels_in_img_order)
print(f"Accuracy of FP32 TF mannequin = {acc_score}n")
# OpenVINO Mannequin Inference
print(f"OpenVINO FP32 IR Mannequin {args.bit_ov_fp32}")
ov_fp32_p = ov_fp32_predictions(args.bit_ov_fp32, imgnet2012_val_dataset)
acc_score = accuracy_score(ov_fp32_p, val_labels_in_img_order)
print(f"Accuracy of FP32 IR mannequin = {acc_score}n")
print("Beginning NNCF dataset Calibration....!!!")
calibration_dataset = nncf.Dataset(img2012_val_split_for_calib)
# OpenVINO IR FP32 to INT8 Mannequin Quantization with NNCF and
# INT8 predictions outcomes on validation dataset
ov_int8_p = nncf_quantize_int8_pred_results(args.bit_ov_fp32, calibration_dataset,
imgnet2012_val_dataset, args.bit_ov_int8)
print(f"OpenVINO NNCF Quantized INT8 IR Mannequin {args.bit_ov_int8}")
acc_score = accuracy_score(ov_int8_p, val_labels_in_img_order)
print(f"Accuracy of INT8 IR mannequin = {acc_score}n")
#acc_score = accuracy_score(tf_p, ov_fp32_p)
#print(f"TF Vs OV FP32 Accuracy Rating = {acc_score}")
#acc_score = accuracy_score(ov_fp32_p, ov_int8_p)
#print(f"OV FP32 Vs OV INT8 Accuracy Rating = {acc_score}")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="BiT Classification mannequin quantization and accuracy measurement")
non-obligatory = parser._action_groups.pop()
required=parser.add_argument_group("required arguments")
non-obligatory.add_argument("--inp_shape", kind=str, assist="N,W,H,C", default="1,224,224,3", required=False)
required.add_argument("--dataset_dir", kind=str, assist="Listing path to ImageNet2012 validation dataset", required=True)
required.add_argument("--gt_labels", kind=str, assist="Path to ImageNet2012 validation ds gt labels file", required=True)
required.add_argument("--bit_m_tf", kind=str, assist="Path to BiT TF fp32 mannequin file", required=True)
required.add_argument("--bit_ov_fp32", kind=str, assist="Path to BiT OpenVINO fp32 mannequin file", required=True)
non-obligatory.add_argument("--bit_ov_int8", kind=str, assist="Path to save lots of BiT OpenVINO INT8 mannequin file",
default="./bit_m_r50x1_1/ov/int8/saved_model.xml", required=False)
parser._action_groups.append(non-obligatory)
args = parser.parse_args()
bit_classification(args)