Koefisien Jaccard

1. Jika diketahui A={1,2,3,4} dan B ={1,2,4} dan C = {1,2,4,5}, berapakah Jaccard (A,B), Jaccard (B,C) dan Jaccard (A,C)

2. Berikutnya untuk kasus query dan document. Misalnya kita punya :

query : ideas of march

doc 1 : caesar died in march

doc 2 : the long march

Cari Koefisien Jaccard antara query dengan doc 1 dan doc 2.

3. Diketahui 3 dokumen :

d1 : "Jack London traveled to Oakland"

d2 : "Jack London traveled to the city of Oakland"

d3 : "Jack traveled from Oakland to London"

Nilai dari Koefisien Jaccard J(d1,d2) dan J(d1,d3) jika dilakukan dengan n-gram analisis dengan n=2 (bigram) adalah

JAWAB :

 1. Jaccard(A,B)

 | A | = 4

 | B | = 3

 | A ∩ B | = 3

 | A U B | = (|A| + |B| - | A ∩ B |) = 4 + 3 – 3 = 4

Jaccard(A,B) = | A ∩ B | / | A U B | = 3/4 = 0.75

Jaccard(B,C)

| B | = 3

| C | = 4

| B ∩ C | = 3

| B U C | = (|B| + |C| - | B ∩ C |) = 3 + 4 – 3 = 4

Jaccard(B,C) = | B ∩ C | / | B U C | = 3/4 = 0.75

Jaccard(A,C)

| A | = 4

| C | = 4

 

| A ∩ C | = 3

| A U C | = (|A| + |C| - | A ∩ C |) = 4 + 4 – 3 = 5

Jaccard(A,C) = | A ∩ C| / | A U C | = 3/5 = 0.6

2. Jaccard(Q, DOC1)

Q = 3

DOC1 = 4

| Q ∩ DOC1 | = 1

| Q U DOC1 | = 6

|Q ∩ DOC1| / | Q U DOC1| = 1/6 = 0.17

 Jaccard(Q, DOC2)

Q = 3

DOC2 = 3

| Q ∩ DOC2 | = 1

| Q U DOC2 | = 5

| Q ∩ DOC2 | / |Q U DOC2| = 1/5 = 0.2

3. Jaccard(D1,D2)

D1 = 4 (Jack London, London traveled, traveled to, to Oakland)

D2 = 7 (Jack London, London traveled, traveled to, to the, the city, city of, of Oakland)

| D1 ∩ D2 | = 3

| D1 U D2 | = 8

| D1 ∩ D2 | / | D1 U D2 | = 3/8 = 0.375

Jaccard(D1,D3)

D1 = 4(Jack London, London traveled, traveled to, to Oakland)

D2 = 5(Jack traveled, traveled from, from Oakland, Oakland to, to London

| D1 ∩ D3 | = 0

| D1 U D2 | = 9

| D1 ∩ D2 | / | D1 U D2 | = 0/9 = 0

Komentar

Postingan Populer