* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Generating Meaningful Test Databases
Survey
Document related concepts
Transcript
Multi-RQP Generating Test Databases for the Functional Testing of OLTP Applications Carsten Binnig Joint work with: Donald Kossmann, Eric Lo DBTest Workshop, SIGMOD 2008, Vancouver Motivation • Today’s testing techniques are not very efficient – 20-70% of the costs of a SW project are spent for testing Costs caused by software errors in the US in 2000: ~$20-$60 bn • Test automation is not a trivial problem – Writing “test programs” which verify the application behavior Maintainability / Quality of “test programs” is not easy • Test automation for DB apps is even harder – Testing a certain behavior needs a particular database state Existing tools generate test databases independent from test cases How to generate relevant test databases for OLTP apps? (other work: [ICDE07], [SIGMOD07]) 2 Example: Online-Library (1) Use Case: „Reservation of a book“ 1. User enters the inventory number of the book 2. System shows the details of the book • Exception 1: Book belongs to the closed stacks 3. User enter her account information 4. System checks account info and finishes reservation • • Exception 2: User account is disabled Exception 3: User account that has overdue fines 3 Example: Online-Library (2) Test Case (Expected Behavior) + Test Database • Test Case 1 (Exception 1): – A book that belongs to the closed stack • Test Case 2 (Exception 2): – A book that does not belong to the closed stacks + – A user account that is disabled • Test Case 3 (Exception 3) : … • Test Case 4 (Successful Reservation) : – A book that does not belong to the closed stacks + – A valid user account with no overdue fines 4 Outline • Introduction • State of the Art / Requirements • (Multi-) Reverse Query Processing (RQP) • Conclusions / Future Work 5 State of the Art • General-purpose Database Generators => Random data over database schema (size of tables, data distributions) – Low test coverage => Data does not enable execution of all test cases – High maintenance costs => Manual adaptation of data necessary • Script-based Database Generators => Application-specific data (e.g., a bunch of SQL INSERT statements) – High initial costs => Writing code to generate test database – Hard to extend => Hard to analyze side-effects • Data Extractors => Extract data from existing applications – Test coverage strongly depends on existing data – High initial effort => Test data needs to be anonymized 6 Requirements for the Generation of Test Databases • Specify a test database individually per test case – High test coverage is possible – Good extensibility for new test cases • Allow a declarative specification of the test database – Maintainability of the specification is good (automatic evolution?) – Data generation can be optimized (runtime, amount of data) • Specify only relevant the data for each test case – Initial costs to specify the test database are low – Changeability of specification is good • Enable logical data independence of test data specification – Database schema can be changed without changing the specification of the test database 7 Outline • Introduction • State of the Art / Requirements • (Multi-) Reverse Query Processing (RQP) • Conclusions / Future Work 8 Reverse Query Processing (RQP) • Problem Statement: – Given: SQL Query Q, Result R, Database Schema S – Output: Database D with Q(D)=R and D satisfies S • Example (Test Case 1): There exists at least one book which does belong to the closed stacks Application Test Case 1 S: CREATE TABLE book ( b_id INTEGER PRIMARY KEY, b_title VARCHAR (20), b_closed BOOLEAN NOT NULL, ... ) Q: SELECT COUNT(*) AS cnt FROM book WHERE b_closed = 1 R: {< >=1 >} 9 b_title b_closed … 1 Title A 1 … 2 Title B 0 … D: b_id RQP RQP Basic Idea Query Processing: Reverse Query Processing: • Input: Database D, Query Q • Output: Result R • Input: Query Q, Result R • Output: Database D Result Result Query Processor Query Reverse Query Processor Query Database Database => RQP can generate many different databases 10 Query Processing (Simplified) SQL Query: χCOUNT(*) Q: SELECT COUNT(*) FROM book WHERE b_closed = 1 Query Result: Query Plan (Relational Algebra): R: 1 as cnt σb_closed = 1 book 11 cnt D: b_id b_title b_closed … 1 Title A 1 … b_id b_title b_closed … 1 Title A 1 … 2 Title B 0 … Reverse Query Processing (Query Compilation) Reverse Query Plan (Reverse Relational Algebra): χ-1COUNT(*) Q: SELECT COUNT(*) AS cnt FROM book WHERE b_closed = 1 as cnt σ-1b_closed book 12 = 1 Data flow SQL Query: Reverse Query Processing (Top-Down Data Generation) R: {< >=1 >} R: cnt 1 χ-1COUNT(*) as cnt σ-1b_closed = 1 Data flow Q: book b_id b_title b_closed … 1 Title A 1 … b_title b_closed … 1 Title A 1 … 2 Title B 0 … D: b_id S: CREATE TABLE book ( b_id INTEGER PRIMARY KEY, b_title VARCHAR (20) NOT NULL, b_closed BOOLEAN NOT NULL, ... ) 13 Multi Reverse Query Processing • Problem: – One query + result are often not sufficient to specify a test database for more complex test cases – Multiple queries + result are necessary • Example (Test Case 4): – Book that does not belong to the closed stacks (Q1 und R1) and – Valid user account without overdue fines (Q2 und R2) • Idea: Restrict input query classes such that … – MRQP can be solved efficiently by using RQP – User can still specify any test database 14 RQP-disjoint Queries • Idea: Q1/R1 and Q2/R2 specify disjoint data sets • Example (Test Case 4): – Book that does not belong to the closed stacks (Q1 and R1) and – Valid user account without overdue fines (Q2 and R2) Q1: SELECT COUNT(*) AS cnt FROM book WHERE b_closed=0 Q2: SELECT COUNT(*) AS cnt FROM user WHERE u_fines=0 R1: {< 1 >} R2: {< 1 >} D1: b_id b_title b_closed … 1 Title A 0 … D2: Table ‘book’ u_id u_name u_fines … 1 User A 0 … Table ‘user’ 15 Query-Refinement • Idea: Q1/R1 specifies a „subset“ of Q2/R2 • Example (Test Case for Use Case “Book Search”): – Ten books of author ‘Grisham’ (Q1 and R1) – One of these books should belong to the closed stacks (Q2 and R2) QQ 1: SELECT COUNT(*) AS cnt 1’: FROM book WHERE b_author='Grisham' AND b_closed <> 1 10>} >} RR 1’: 1: {< 9 Q2: SELECT COUNT(*) AS cnt FROM book WHERE b_author='Grisham' AND b_closed = 1 R2: {< 1 >} D1’: D2: b_id b_author b_closed … 2 Grisham 0 … … … … … 10 Grisham 0 … Table ‘book’ b_id b_author b_closed … 1 Grisham 1 … Table ‘book’ 16 Outline • Introduction • State of the Art / Requirements • (Multi-) Reverse Query Processing (RQP) • Conclusions / Future Work 17 Conclusions / Future Work • Problems of existing test database generators – Test databases are generated independent of test cases – Low test coverage, high maintainability costs, … • (M)RQP to specify and generate test databases – Test data specification: declarative, minimal, … – Data generation based on database techniques (e.g., algebra operators, …) • Open Research Problems – Evolution of test databases – Study usability of MRQP – … 18 Questions ? snoitseuQ ? 19