Quantization คืออะไร Post-Training Quantization มีประโยชน์อย่างไร กับ Deep Neural Network บนอุปกรณ์ Embedded Device, IoT, Edge, มือถือ Mobile - tflite ep.2

ถ้าข้างนอกมีฝนตกอยู่ เราอาจจะไม่ได้ต้องการทราบว่าฝนกำลังตกกี่เม็ดต่อวินาที เราต้องการทราบแค่เพียงว่า ฝนตกหนัก ฝนตกปานกลาง หรือฝนตกเล็กน้อย

เช่นเดียวกับการพยากรณ์ของ Neural Network บ่อยครั้งที่เราไม่ได้ต้องการความแม่นยำขนาด ตัวเลยทศนิยม Floating Point 32 Bit หรือแม้กระทั่ง 16 Bit และในหลาย ๆ งานใช้แค่จำนวนเต็ม Integer 8 Bit ก็เพียงพอแล้ว

Quantization คืออะไร

Quantization error. The simplest way to quantize a signal is to choose the digital amplitude value closest to the original analog amplitude. Credit https://en.wikipedia.org/wiki/File:Quantization_error.png

Quantization คือ เทคนิคที่ใช้จำนวนเต็ม Integer 8 Bit ให้การประมาณค่าของจำนวนตัวเลขทศนิยม Floating Point ระหว่างค่าสูงสุด และต่ำสุดในช่วงที่กำหนด เทคนิคนี้เป็นที่นิยมในงาน Signal Processing ใช้ในการแปลง สัญญาณ Analog เป็น Digital

ประโยชน์ของ Post-Training Quantization

Quantization เป็นเครื่องมือสำคัญ ที่จะลดต้นทุนในการทำงานของ Neural Network ลดขนาดโมเดล ลด Memory ที่ใช้ ลดพลังงานที่ใช้ ลดเวลาในการตอบสนอง (Latency) โดยเฉพาะอย่างยิ่งสำหรับงานบนมือถือ และ Embedded ตัวอย่างเช่น เราสามารถทำ Quantization ให้โมเดล Inception ย่อขนาดโมเดลจาก 91 MB เหลือเพียง 23MB หรือ 25% จากต้นฉบับเท่านั้น

EfficientNet-EdgeTPU model provides new levels of performance that balance low latency with high accuracy on the Edge TPU. It comes in three sizes (small, medium, and large), offering increasing levels of accuracy with trade-offs in inference latency. https://coral.ai/technology/

การใช้ตัวเลขจำนวนเต็ม Integer แทนตัวเลขทศนิยม Floating Point เป็นการประหยัด Hardware ประมวลผลที่จำเป็นต้องใช้เป็นอย่างมาก และในกรณีที่ GPU รองรับ Int8 สามารถเพิ่มความเร็วได้มากสุดถึง 25 เท่าเลยทีเดียว ถ้ารับความแม่นยำที่ลดลงได้

ตัวอย่าง Quantization 2 Bit

2-bit resolution with four levels of quantization, compared with an analog sine wave. Credit https://en.wikipedia.org/wiki/File:2-bit_resolution_analog_comparison.png

ตัวอย่าง Quantization 3 Bit

3-bit resolution with eight levels of quantization, compared with an analog sine wave. Credit https://en.wikipedia.org/wiki/File:3-bit_resolution_analog_comparison.png

Optimization options

มีตัวเลือกในการทำ Post-Training Quantization หลายแบบ มีประโยชน์ และข้อจำกัดที่แตกต่างกัน ดังตารางด้านล่าง

Technique	Benefits	Hardware
Weight quantization	4x smaller, 2-3x speedup, accuracy	CPU
Full integer quantization	4x smaller, 3x+ speedup	CPU, Edge TPU, etc.
Float16 quantization	2x smaller, potential GPU acceleration	CPU/GPU

Summary table of the choices and the benefits Post-Training Quantization Optimization options

ขั้นแรก เราจะ Optimize โมเดล ด้วยการทำ Quantize ลดจำนวน Bit ของ Parameter (Weight quantization) ในโมเดลลง

และถ้าเรามีข้อมูลตัวอย่างเพียงพอ นอกเหนือจาก Parameter เราสามารถ Quantize ข้อมูล และ Activation ได้อีก โดยการให้ชุดข้อมูลตัวอย่าง รันผ่านโมเดล เพื่อเก็บสถิติ Representative Dataset วัด Dynamic Range ของข้อมูล และ Activation สร้าง Input Data Generator เพื่อส่งให้กับ Converter ใช้ในการทำ Integer Quantization ต่อไป

This decision tree can help determine which post-training quantization method is best for your use case. Credit https://www.tensorflow.org/lite/performance/post_training_quantization

โมเดลที่แปลงแล้ว จะยังรับ Input / Output เป็น Float เหมือนเดิมเพื่อความสะดวก จะได้ไม่ต้องแก้โปรแกรม

และ ในโมเดล ถ้า Ops ไหน ที่ไม่มี Quantized Implementation ก็จะใช้เป็น Floating Point Implementation เหมือนเดิม แต่ Float16 quantization ลดขนาดจาก Float32 เป็น Float16 แบบนี้จะทำให้การ Convert โมเดลทำได้อย่างราบรื่น แต่ก็จะจำกัดให้รันได้เฉพาะ Hardware ที่รองรับ Floating Point

ในการเลือกเทคนิค Quantization ควรพิจารณาจาก Hardware ที่จะนำโมเดลไป Deploy และ Ops ที่ใช้ในโมเดล ว่ามี Quantized Implementation ครบทุก Ops หรือไม่

Credit

แชร์ให้เพื่อน:

Surapong Kanoktipsatharporn

CTO at Bua Labs

The ultimate test of your knowledge is your capacity to convey it to another.

Quantization คืออะไร Post-Training Quantization มีประโยชน์อย่างไร กับ Deep Neural Network บนอุปกรณ์ Embedded Device, IoT, Edge, มือถือ Mobile – tflite ep.2

Quantization คืออะไร