Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Schoolwork Exercise 9.1 Question: In Rocchio’s algorithm, what weight setting for α/β/γdoes a “Find pages like this one” search correspond to? Answer: The Rocchio’s algorithm formula is Reasonable values might be α = 1、β = 0.75、γ = 0.15 In this exercise α = 1、β = 1、γ = 0 This is best exact relevance feedback Exercise9.3 Question: Under what conditions would the modified query qm in Equation9.3 be the same as the original query q0 ? In all other cases, is qm closer than q0 to the centroid of the relevant documents? Answer: If the α = 1 during the αq0 , β = 1 during the Dr ,γ = 0 during the Dnr , then the modified query qm will same as the original query q0 Because users usual input improper information, such as Misspellings, Cross-language, Mismatch, so users need some modified query to help them to research in search engine. In this question, α、β、γ attached to q0, Dr, Dnr . Reasonable values might be α = 1、β = 0.75、γ = 0.15, so qm closer than q0 to the centroid of the relevant documents Exercise9.5 Question: Suppose that a user’s initial query is cheap CDs cheap DVDs extremely cheap CDs. The user examines two documents, d1 and d2. She judges d1, with the content CDs cheap software cheap CDs relevant and d2 with content cheap thrills DVDs nonrelevant. Assume that we are using direct term frequency (with no scaling and no document frequency). There is no need to length-normalize vectors. Using Rocchio relevance feedback as in Equation(9.3) what would the revised query vector be after relevance feedback? Assume α = 1、β = 0.75、γ = 0.25. Answer: Query is: Cheap CDs cheap DVDs extremely cheap CDs Relevance document (d1) is: CDs cheap software cheap CDs Irrelevance document (d2) is: cheap thrills DVDs terms are: (CDs, cheap, software, thrills, DVDS) So q (1, 1, 0, 0, 1) d1(1, 1, 1, 0, 0) d2(0, 1, 0, 1, 1) Known α = 1、β = 0.75、γ = 0.25 So qm = q + 0.75 * d1 – 0.25 * d2 = (1.75, 1.5, 0.75, 0.25, 0.75) Exercise9.7 Question If A is simply a Boolean cooccurrence matrix, then what do you get as the entries in C? Answer Matrix A is matrix of term – document Matrix C is matrix of term – term Matrix A d1 d2 d3 dn t1 t2 t3 tn Matrix C t1 t2 t3 tn t1 t2 t3 tn In here document and term is binary. If similarity between document and term, then result is 1, otherwise result is 0. C = AAT, then Cu,v is a similarity score between terms u and v, with a larger number being better.