N-Gram คืออะไร Sentiment Classification วิเคราะห์รีวิวหนัง IMDB แบบ N-Gram (Trigram, Bigram, Unigram) ด้วย Naive Bayes, Logistic Regression - NLP ep.6

ใน ep ที่แล้ว Sentiment Classification วิเคราะห์รีวิวหนัง IMDB แง่บวก แง่ลบ ด้วย Naive Bayes และ Logistic Regression เราใช้ 1 Token ต่อ 1 คำ เรียกว่า Unigram

แต่ใน ep นี้ เราจะมาเรียนรู้ N-Gram ในงาน Sentiment Classification ด้วยอัลกอริทึมเดียวกัน ep ที่แล้ว

N-Gram คืออะไร

Thai Airways check-in counters at Suvarnabhumi International Airport passenger terminal, Thailand. Credit https://en.wikipedia.org/wiki/File:VTBS-Thai_Airways_Check-in_counters.JPG

N-Gram คือ 1 Token สามารถประกอบด้วยหลายชิ้นส่วนต่อเรียงกันไป เช่น Phonemes, Syllables, ตัวอักษร, หรือ คำ เช่น Bigram 2 คำ, Trigram 3 คำ, etc.

ทำให้โมเดลสามารถเรียนรู้จากคำที่เรียงต่อกัน แล้วเกิดความหมายใหม่ได้ เช่น “check in”, “piss off”, “ask somebody out”, “blow up”, “break down”, “count on somebody”, “grow apart”, “hang in”, “warm something up” แทนที่จะมองเป็นคำเดี่ยว ๆ เหมือนก่อนหน้า เช่น “check”, “in”, “warm”, “up”, “break”, “down”

ตัวอย่าง Trigram จาก Dataset

to be or
be or not
or not to
not to be

he does just
he does n't
he does not
he does nothing
he does or
he does show
he does so
he does this

the supporting
the supporting actors
the supporting cast
the supporting character
the supporting characters
the supporting performances

ข้อดีของ N-Gram

ข้อดีของ N-Gram คือ ทำให้โมเดลเรียนรู้ความสัมพันธ์ของคำที่ติดกันได้ดีขึ้น

ข้อเสียของ N-Gram

แต่ข้อเสียคือ จำนวน vocab Dictioinary จะเพิ่มขึ้นแบบ Exponential และถ้าใช้ N-Gram ยาวเกินไป บาง vocab ก็จะปรากฎแค่ครั้งเดียว ยิ่ง Sparse เข้าไปอีก ทำให้เราไม่สามารถใช้ N-Gram ได้ยาวตามที่ต้องการ

เรามาเริ่มกันเลยดีกว่า

Check it out on github Last updated: 28/02/2024 04:27:02

แชร์ให้เพื่อน:

Surapong Kanoktipsatharporn

Solutions Architect at Bua Labs

The ultimate test of your knowledge is your capacity to convey it to another.

N-Gram คืออะไร Sentiment Classification วิเคราะห์รีวิวหนัง IMDB แบบ N-Gram (Trigram, Bigram, Unigram) ด้วย Naive Bayes, Logistic Regression – NLP ep.6

N-Gram คืออะไร

ตัวอย่าง Trigram จาก Dataset

ข้อดีของ N-Gram

ข้อเสียของ N-Gram

เรามาเริ่มกันเลยดีกว่า

แชร์ให้เพื่อน:

Published by Surapong Kanoktipsatharporn

N-Gram คืออะไร

ตัวอย่าง Trigram จาก Dataset

ข้อดีของ N-Gram

ข้อเสียของ N-Gram

เรามาเริ่มกันเลยดีกว่า

แชร์ให้เพื่อน:

บทความที่เกี่ยวข้อง:

Published by Surapong Kanoktipsatharporn