Download Evaluation in IR in context of the Web

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Evaluating IR (Web) Systems
•
•
•
•
•
Study of Information Seeking & IR
Pragmatics of IR experimentation
The dynamic Web
Cataloging & understanding Web docs
Web site characteristics
Study of Info seeking & retrieval
- Well known authors (useful for research papers)
• Real life studies (not TREC)
-
User context of questions
Questions (structure & classification)
Searcher (cognitive traits & decision making)
Information Items
• Difference searches with same question
• Relevant items
• “models, measures, methods, procedures and
statistical analyses” p 175
• Beyond common sense and anecdotes
Study 2
• Is there ever enough user research?
• A good set of elements to include in an IR
system evaluation
• How do you test for real life situations?
-
Questions the users actually have
Expertise in subject (or not)
Intent
User’s computers, desks & materials
• What’s a search strategy?
- Tactics, habits, previous knowledge
• How do you collect search data?
Study 3
• How do you ask questions?
- General knowledge test
- Specific search terms
• Learning Style Inventory
- NOT the best way to understand users
- Better than nothing
- Choose your questions like your users
• Let users choose their questions?
• Let users work together on searches
• Effectiveness Measures
- Recall, precision, relevance
Study 4
• Measuring efficiency
- Time on tasks
- Task completion
• Correct answer
• Any answer?
- Worthwhile?
• Counting correct answers
• Statistics
-
Clicks, commands, pages, results
Not just computer time, but the overall process
Start with the basics, then get advanced
Regression analysis (dependencies for large studies)
Let’s design an experiment
• User Selection
- Searcher (cognitive traits & decision making)
- User context of questions
• Environment
• Questions (structure & classification)
• Information Items
- Successful answers
- Successful/Worthwhile sessions
• Measurement
Pragmatics of IR experimentation
• The entire IR evaluation must be planned
• Controls are essential
• Working with what you can get
- Expert defined questions & answers
- Specific systems
• Fast, cheap, informal tests
- Not always, but could be pre-tests
- Quick results for broad findings
Pragmatic Decision1
• Testing at all?
- Purpose of test
- Pull data from previous tests
• Repeat old test
- Old test with new system
- Old test with new database
• Same test, many users
- Same system
- Same questions (data)
Pragmatic Decision 2
• What kind of test?
• Everything at once?
- System (help, no help?)
- Users (types of)
- Questions (open-ended?)
• Facts
- Answers with numbers
- Words the user knows
• General knowledge
- Found more easily
- Ambiguity goes both ways
Pragmatic Decision 3
•
•
•
•
Understanding the Data
What are your variables? (p 207)
Working with initial goals of study
Study size determines measurement methods
- Lots of user
- Many questions
- All system features, competing system features
• What is acceptable/passable performance?
- Time, correct answers, clicks?
- Which are controlled?
Pragmatic Decision 4
• What database?
- The Web (no control)
- Smaller dataset (useful to user?)
• Very similar questions, small dataset
- Web site search vs. whole Web search
- Prior knowledge of subject
- Comprehensive survey of possible results
beforehand
• Differences other than content?
Pragmatic Decision 5
• Where do queries/questions come from?
- Content itself
- User pre-interview (pre-tests)
- Other studies
• What are search terms (used or given)
- Single terms
- Advanced searching
- Results quantity
Pragmatic Decisions 6, 7, 8
• Analyzing queries
- Scoring system
- Logging use
• What’s a winning query (treatment of units)
- User success, expert answer
- Time, performance
- Different querie with same answer?
• Collect the data
- Logging and asking users
- Consistency (software, questionnaires, scripts)
Pragmatic Decisions 9 & 10
• Analyzing Data
• Dependent on the dataset
• Compare to other studies
• Basic statistics first
• Presenting Results
• Work from plan
• Purpose
• Measurement
• Models
• Users
• Matching other studies
Keeping Up with the Changing Web
• Building Indices is difficult enough in theory
• What about a continuously changing huge volume of
information?
• Is old information good?
• What does up-to-date mean anymore?
• Is Knowledge a depreciating commodity?
- Correctness + Value over time
• Different information changes at different rates
- Really it’s new information
• How do you update an index with constantly
changing information?
Changing Web Properties
• Known distributions for information change
• Sites and pages may have easily identifiable
patterns of update
- 4% change on every observation
- Some don’t ever change (links too)
• If you check and a page hasn’t changed, what
is the probability it will ever change?
• Rate of change is related to rate of attention
- Machines vs. Users
- Measures can be compared along with information
Dynamic Maint. of Indexes w/Landmarks
• Web Crawlers do the work in gathering pages
• Incremental crawling means incremented
indices
- Rebuild the whole index more frequently
- Devise a scheme for updates (and deletions)
- Use supplementary indices (i.e. date)
• New documents
• Changed documents
• 404 documents
Landmarks for Indexing
• Difference-based method
• Documents that don’t change are landmarks
- Relative addressing
- Clarke: block-based
- Glimpse: chunking
• Only update pointers to pages
• Tags and document properties are
landmarked
• Broader pointers mean less updates
• Faster indexing – Faster access?
Yahoo! Cataloging the Web
• How do information professionals build an “index” of
the Web?
• Cataloging applies to the Web
• Indexing with synonyms
• Browsing indexes vs searching them
• Comprehensive index not the goal
- Quality
- Information Density
• Yahoo’s own ontology – points to site for full info
• Subject Trees with aliases (@) to other locations
• “More like this” comparisons as checksums
Yahoo uses tools for indexing
Investigation of Documents from the WWW
• What properties do Web documents have?
• What structure and formats do Web
documents use?
• What properties do Web documents have?
-
Size – 4K avg.
Tags – ratio and popular tags
MIME types (file extensions)
URL properties and formats
Links – internal and external
Graphics
Readability
WWW Documents Investigation
• How do you collect data like this?
- Web Crawler
• URL identifier, link follower
- Index-like processing
• Markup parser, keyword identifier
• Domain name translation (and caching)
• How do these facts help with indexing?
• Have general characteristics changed?
• (This would be a great project to update.)
Properties of Highly-Rated Web Sites
• What about whole Web sites?
• What is a Web site?
- Sub-sites?
- Specific contextual, subject-based parts of a Web
site?
- Links from other Web pages: on the site and off
- Web site navigation effects
• Will experts (like Yahoo catalogers) like a
site?
Properties
•
•
•
•
•
•
Links & formatting
Graphics – one, but not too many
Text formatting – 9 pt. with normal style
Page (layout) formatting – min. colors
Page performance (size and acess)
Site architecture (pages, nav elements)
- More links within and external
- Interactive (search boxes, menus)
• Consistency within a site is key
• How would a user or index builder make use
of these?
Extra Discussion
• Little Words, Big Difference
- The difference that makes a difference
- Singular and plural noun identification can change
indices and retrieval results
- Language use differences
• Decay and Failures
- Dead links
- Types of errors
- Huge amount of dead links (PageRank effective)
• 28% in 1995-1999 Computer & CACM
• 41% in 2002 articles
• Better than the average Web page?
Break!
Topic Discussions Set
• Leading WIRED Topic Discussions
- About 20 minutes reviewing issues from the week’s
readings
• Key ideas from the readings
• Questions you have about the readings
• Concepts from readings to expand on
- PowerPoint slides
- Handouts
- Extra readings (at least a few days before class) –
send to wired listserv
Web IR Evaluation
- 5 page written evaluation of a Web IR System
- technology overview (how it works)
• Not an eval of a standard search engine
• Only main determinable diff is content
- a brief overview of the development of this type of
system (why it works better)
- intended uses for the system (who, when, why)
- (your) examples or case studies of the system in
use and its overall effectiveness
Projects and/or Papers Overview
• How can (Web) IR be better?
- Better IR models
- Better User Interfaces
• More to find vs. easier to find
• Web documents sampling
• Web cataloging work
- Metadata & IR
- Who watches the catalogers?
• Scriptable applications
- Using existing IR systems in new ways
- RSS & IR
Project Ideas
• Searchable Personal Digital Library
• Browser hacks for searching
• Mozilla keeps all the pages you surf so you
can search through them later
- Mozilla hack
- Local search engines
• Keeping track of searches
• Monitoring searches
Paper Ideas
• New datasets for IR
• Search on the Desktop – issues, previous
research and ideas
• Collaborative searching – advantages and
potential, but what about privacy?
• Collaborative Filtering literature review
• Open source and IR systems history &
discussion