สับไพ่ข้อมูล DataLoader ด้วย Random Sampler และ Collate ป้อนโมเดล เทรน Machine Learning - Neural Network ep.7

ในแต่ละ Epoch ของการเทรน Machine Learning สอนโมเดล Deep Neural Network เราไม่ควรป้อนข้อมูลที่เรียงลำดับเหมือนกันทุกครั้งให้โมเดล ใน ep นี้เราจะมาสร้าง DataLoader เวอร์ชันใหม่ ที่จะสับไพ่ข้อมูลตัวอย่างก่อนป้อนให้โมเดล เป็นการลดการจำข้อสอบของโมเดล ช่วยให้โมเดล Generalization ได้ดีขึ้น ลด Variance ของโมเดล

การที่เราแบ่งข้อมูลออกเป็น Mini-Batch เช่น Batch Size = 32 ป้อนข้อมูลตัวอย่าง Feedforward ให้กับโมเดล ทีละ 32 (x, y) ถ้าเราไม่สับไพ่ เราใช้ข้อมูลตามลำดับที่เราได้รับมาเลย อาจจะมีการจัดเรียงที่ทำให้ข้อมูล 32 ตัวนี้ ยากเกิน หรือง่ายเกินไป และจะเป็นแบบนี้ทุก ๆ Epoch ทำให้โมเดลเรียนรู้ได้ยาก การสับไพ่ข้อมูลแบบ Random จะแก้ปัญหาตรงนี้

An advanced two-handed flourish. Credit https://en.wikipedia.org/wiki/Cardistry#/media/File:Display_Card_Flourish.jpg

และถ้าเราสับไพ่ก่อน Split แบ่ง Training Set, Validation Set, Test Set ก็จะช่วยลดปัญหาความแตกต่างระหว่าง Train/Validation/Test Skew ได้อีก

เรามาเริ่มกันเลยดีกว่า

Check it out on github Last updated: 28/02/2024 04:27:02

แชร์ให้เพื่อน:

Surapong Kanoktipsatharporn

Solutions Architect at Bua Labs

The ultimate test of your knowledge is your capacity to convey it to another.

สับไพ่ข้อมูล DataLoader ด้วย Random Sampler และ Collate ป้อนโมเดล เทรน Machine Learning – Neural Network ep.7

เรามาเริ่มกันเลยดีกว่า

แชร์ให้เพื่อน:

Published by Surapong Kanoktipsatharporn

เรามาเริ่มกันเลยดีกว่า

แชร์ให้เพื่อน:

บทความที่เกี่ยวข้อง:

Published by Surapong Kanoktipsatharporn