Koefisien Jaccard
1. Jika diketahui A={1,2,3,4} dan B ={1,2,4} dan C = {1,2,4,5}, berapakah Jaccard (A,B), Jaccard (B,C) dan Jaccard (A,C)
2. Berikutnya untuk kasus query dan document. Misalnya
kita punya :
query : ideas of march
doc 1 : caesar died in march
doc 2 : the long march
Cari Koefisien Jaccard antara query dengan doc 1 dan
doc 2.
3. Diketahui 3 dokumen :
d1 : "Jack London traveled to Oakland"
d2 : "Jack London traveled to the city of
Oakland"
d3 : "Jack traveled from Oakland to London"
Nilai dari Koefisien Jaccard J(d1,d2) dan J(d1,d3)
jika dilakukan dengan n-gram analisis dengan n=2 (bigram) adalah
JAWAB
:
1. Jaccard(A,B)
| A | = 4
| B | = 3
| A ∩ B | = 3
| A U B | =
(|A| + |B| - | A ∩ B |) = 4 + 3 – 3 = 4
Jaccard(A,B) = | A ∩ B | / | A U B | = 3/4 = 0.75
Jaccard(B,C)
| B | = 3
| C | = 4
| B ∩ C | = 3
| B U C | = (|B| + |C| - | B ∩ C |) = 3 + 4 – 3 = 4
Jaccard(B,C) = | B ∩ C | / | B U C | = 3/4 = 0.75
Jaccard(A,C)
| A | = 4
| C | = 4
| A ∩ C | = 3
| A U C | = (|A| + |C| - | A ∩ C |) = 4 + 4 – 3 = 5
Jaccard(A,C) = | A ∩ C| / | A U C | = 3/5 = 0.6
2. Jaccard(Q, DOC1)
Q = 3
DOC1 = 4
| Q ∩ DOC1 | = 1
| Q U DOC1 | = 6
|Q ∩ DOC1| / | Q U DOC1| = 1/6 = 0.17
Jaccard(Q,
DOC2)
Q = 3
DOC2 = 3
| Q ∩ DOC2 | = 1
| Q U DOC2 | = 5
| Q ∩ DOC2 | / |Q U DOC2| = 1/5 = 0.2
3. Jaccard(D1,D2)
D1 = 4 (Jack London, London traveled, traveled to, to
Oakland)
D2 = 7 (Jack London, London traveled, traveled to, to
the, the city, city of, of Oakland)
| D1 ∩ D2 | = 3
| D1 U D2 | = 8
| D1 ∩ D2 | / | D1 U D2 | = 3/8 = 0.375
Jaccard(D1,D3)
D1 = 4(Jack London, London traveled, traveled to, to
Oakland)
D2 = 5(Jack traveled, traveled from, from Oakland,
Oakland to, to London
| D1 ∩ D3 | = 0
| D1 U D2 | = 9
| D1 ∩ D2 | / | D1 U D2 | = 0/9 = 0
Komentar
Posting Komentar