Download Schoolwork Exercise 9.1 Question: In Rocchio`s algorithm, what

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Schoolwork
Exercise 9.1
Question:
In Rocchio’s algorithm, what weight setting for α/β/γdoes a “Find pages like this one”
search correspond to?
Answer:
The Rocchio’s algorithm formula is
Reasonable values might be α = 1、β = 0.75、γ = 0.15
In this exercise
α = 1、β = 1、γ = 0
This is best exact relevance feedback
Exercise9.3
Question:
Under what conditions would the modified query qm in Equation9.3 be the same as the
original query q0 ? In all other cases, is qm closer than q0 to the centroid of the relevant
documents?
Answer:
If the α = 1 during the αq0 , β = 1 during the Dr ,γ = 0 during the Dnr , then the modified
query qm will same as the original query q0
Because users usual input improper information, such as Misspellings, Cross-language,
Mismatch, so users need some modified query to help them to research in search engine.
In this question, α、β、γ attached to q0, Dr, Dnr . Reasonable values might be α = 1、β =
0.75、γ = 0.15, so qm closer than q0 to the centroid of the relevant documents
Exercise9.5
Question:
Suppose that a user’s initial query is cheap CDs cheap DVDs extremely cheap CDs. The
user examines two documents, d1 and d2. She judges d1, with the content CDs cheap
software cheap CDs relevant and d2 with content cheap thrills DVDs nonrelevant. Assume
that we are using direct term frequency (with no scaling and no document frequency).
There is no need to length-normalize vectors. Using Rocchio relevance feedback as in
Equation(9.3) what would the revised query vector be after relevance feedback? Assume
α = 1、β = 0.75、γ = 0.25.
Answer:
Query is:
Cheap CDs cheap DVDs extremely cheap CDs
Relevance document (d1) is:
CDs cheap software cheap CDs
Irrelevance document (d2) is:
cheap thrills DVDs
terms are:
(CDs, cheap, software, thrills, DVDS)
So
q (1, 1, 0, 0, 1)
d1(1, 1, 1, 0, 0)
d2(0, 1, 0, 1, 1)
Known α = 1、β = 0.75、γ = 0.25
So
qm = q + 0.75 * d1 – 0.25 * d2 = (1.75, 1.5, 0.75, 0.25, 0.75)
Exercise9.7
Question
If A is simply a Boolean cooccurrence matrix, then what do you get as the entries in C?
Answer
Matrix A is matrix of term – document
Matrix C is matrix of term – term
Matrix A
d1 d2 d3 dn
t1
t2
t3
tn
Matrix C
t1 t2 t3 tn
t1
t2
t3
tn
In here document and term is binary. If similarity between document and term, then result
is 1, otherwise result is 0.
C = AAT, then Cu,v is a similarity score between terms u and v, with a larger number being
better.