Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mobile Data Management Sanjay Kumar Madria Department of Computer Science University of Missouri-Rolla Rolla, MO 65401 1 Wireless Technologies Wireless local area networks (WaveLan, Aironet – Possible Transmission error Cellular wireless – Low bandwidth Packet radio (Metricom) -Low Bandwidth Satellites (Inmarsat, Iridium) – Long Latency 2 Mobility Constraints CPU Power Bandwidth Delay tolerance Physical size Constraints on peripherals and GUIs (modality of interaction) Locations change dynamically 3 Why Mobile Data Mgmt? Wireless Connectivity and use of PDA’s , handheld computing devices on the rise Workforces will carry extracts of corporate databases with them Need central database repositories to serve these work groups and keep them fairly upto-date and consistent 4 Applications Sales Force Automation - especially in pharmaceutical industry, consumer goods, parts Financial Consulting and Planning Insurance and Claim Processing - Auto, General, and Life Insurance Real Estate/Property Management, Maintenance and Building Contracting 5 Data Processing Scenario One server or many servers (corporate data, inventory, HR, orders/billing) Shared Data Some Local Data per client , mostly subset of global data Need for accurate, up-to-date information Limitations Short connect time per session Infrequent connections Clients may remain dormant for extended periods of time Clients not reachable from servers at all times 6 What is Mobility? A device that moves – Between different geographical locations – Between different networks A person who moves – – – – Between different geographical locations Between different networks Between different communication devices Between different applications 7 Device mobility Plug in laptop at home/work on Ethernet – Occasional long breaks in network access – Wired network access only (connected => wellconnected) – Network address changes – Only one type of network interface – May want access to information when no network is available: hoard information locally Cell phone with access to cellular network – Continuous connectivity – Phone # remains the same (high-level network address) – Network performance may vary from place to place 8 Device mobility…. Can we achieve best of both worlds? – Continuous connectivity of wireless access – Performance of better networks when available Laptop moves between Ethernet, WaveLAN and Metricom networks – Wired and wireless network access – Potentially continuous connectivity, but may be breaks in service – Network address changes – Radically different network performance on different networks 9 People mobility Phone available at home or at work – Multiple phone numbers to reach me – Breaks in my reachability when I’m not in Cell phone – Only one number to reach me – Continuously reachable – Sometimes poor quality and expensive connectivity Cell phone, networked PDA, etc. – Multiple numbers/addresses for best quality connection – Continuous reachability – Best choice of address may depend on sender’s device or message content 10 Mobility means changes How does it affect the following? Hardware – Lighter – More robust – Lower power Wireless communication – Can’t tune for stationary access Network protocols – Name changes – Delay changes – Error rate changes 11 Changes…... Fidelity – High fidelity may not be possible Data consistency – Strong consistency no longer possible Location/transparency awareness – Transparency not always desirable Names/addresses – Names of endpoints may change Security – Lighter-weight algorithms – Endpoint authentication harder – Devices more vulnerable 12 Changes…... Performance – Network, CPU all constrained – Delay and delay variability Operating systems – New resources to track and manage: energy Applications – Name changes – Changes in connectivity – Changes in quality of resources People – Introduces new complexities, failures, devices 13 Example changes Addresses – Phone numbers, IP addresses Network performance – Bandwidth, delay, bit error rates, cost, connectivity Network interfaces – PPP, eth0, strip Between applications – Different interfaces over phone & laptop Within applications – Loss of bandwidth triggers change from color to B&W Available resources – Files, printers, displays, power, even routing 14 Most RDBMS vendors support the ISDB scenario - but no design and optimization aids Specialized Environments for ISDB apps: Sybase Remote Server Synchrologic iMOBILE Microsoft SQL server - mobile app support Oracle Lite Xtnd-Connect-Server (Extended Technologies) Scoutware (Riverbed Technologies) 15 Personal Communication System (PCS) Wireless Components MSC (MTSO) BS MS MS Wireless component Cell 16 Personal Communication System (PCS) Mobile cells Metropolitan area Metropolitan area BS Base Station Coverage area in one cell BS BS Coverage area in three cells Large cells. Low density Small cells. High density Smaller cells. Higher density 17 Personal Communication System (PCS) Mobile cells The entire coverage area is a group of a number of cells. The size of cell depends upon the power of the base stations. MSC PSTN 18 Personal Communication System (PCS) Frequency reuse A 2 7 3 7 4 5 2 3 1 1 6 A 2 6 4 5 D7 3 1 6 A A A 4 5 A A D 3N R D = distance between cells using the same frequency R = cell radius N = reuse pattern (the cluster size, which is 7). Thus, for a 7-cell group with cell radius R = 3 miles, the frequency reuse distance D is 13.74 miles. 19 Personal Communication System (PCS) Problems with cellular structure How to maintain continuous communication between two parties in the presence of mobility? Solution: Handoff How to maintain continuous communication between two parties in the presence of mobility? Solution: Roaming How to locate of a mobile unit in the entire coverage area? Solution: Location management 20 Personal Communication System (PCS) Handoff A process, which allows users to remain in touch, even while breaking the connection with one BS and establishing connection with another BS. MSC MSC New BS Old BS Old BS MSC MSC Old BS New BS New BS Old BS New BS 21 Personal Communication System (PCS) Handoff To keep the conversation going, the Handoff procedure should be completed while the MS (the bus) is in the overlap region. Cell overlap region G Old BS New BS 22 Personal Communication System (PCS) Handoff types with reference to the network Intra-system handoff or Inter-BS handoff The new and the old BSs are connected to the same MSC. MSC Old BS New BS 23 Personal Communication System (PCS) Intra-system handoff or Inter-BS handoff Steps 1. The MU (MS) momentarily suspends conversation and initiates the handoff procedure by signaling on an idle (currently free) channel in the new BS. Then it resumes the conversation on the old BS. MSC Old BS New BS 24 Personal Communication System (PCS) Intra-system handoff or Inter-BS handoff 2. Upon receipt of the signal, the MSC transfers the encryption information to the selected idle channel of the new BS and sets up the new conversation path to the MS through that channel. The switch bridges the new path with the old path and informs the MS to transfer from the old channel to the new channel. MSC Old BS New BS 25 Personal Communication System (PCS) Intra-system handoff or Inter-BS handoff 3. After the MS has been transferred to the new BS, it signals the network and resumes conversation using the new channel. MSC Old BS New BS 26 Personal Communication System (PCS) Intra-system handoff or Inter-BS handoff 4. Upon the receipt of the handoff completion signal, the network removes the bridge from the path and releases resources associated with the old channel. MSC Old BS New BS 27 Personal Communication System (PCS) Handoff types with reference to the network Intersystem handoff or Inter-MSC handoff The new and the old BSs are connected to different MSCs. PSTN BS1 MSC A Trunk BS2 MSC B BS1 MSC A Trunk BS2 MSC B 28 Personal Communication System (PCS) Roaming Administrative constraints Billing. Subscription agreement. Call transfer charges. User profile and database sharing. Any other policy constraints. 29 Personal Communication System (PCS) Roaming Technical constraints Bandwidth mismatch. For example, European 900MHz band may not be available in other parts of the world. Service providers must be able to communicate with each other. Needs some standard. Mobile station constraints. 30 Personal Communication System (PCS) Roaming Two basic operations in roaming management are Registration (Location update): The process of informing the presence or arrival of a MU to a cell. Location tracking: the process of locating the desired MU. 31 Personal Communication System (PCS) Registration Two-Tier Scheme HLR: Home Location Register A HLR stores user geographical location. profile and the VLR: Visitor Location Register A VLR stores user profile and the current location who is a visitor to a different cell that its home cell. 32 Personal Communication System (PCS) Registration Two-Tier Scheme steps. MU1 moves to cell 2. Cell 1 Cell 2 MU1 MU1 33 Personal Communication System (PCS) Registration Steps 1. MU1 moves to cell 2. The MSC of cell 2 launches a registration query to its VLR 2. 2. VLR2 sends a registration message containing MU’s identity (MIN), which can be translated to HLR address. 3. After registration, HLR sends an acknowledgment back to VLR2. 4. HLR sends a deregistration message to VLR1 (of cell 1) to delete the record of MU1 (obsolete). VLR1 acknowledges the cancellation. 34 Personal Communication System (PCS) Location tracking Steps 1. VLR of cell 2 is searched for MU1’s profile. 2. If it is not found, then HLR is searched. 3. Once the location of MU1 is found, then the information is sent to the base station of cell 1. 4. Cell 1 establishes the communication. 35 Personal Communication System (PCS) Location tracking Two-Tier Scheme steps location search 5 Dest HLS Id HLS Dest Dest-HLS 3 Id LS Dest Dest-ls - 6 4 Source ls 9 Id MSS Dest Dest-mss - 8 Dest ls 7 10 2 Source-mss 1 36 Src Dest Personal Communication System (PCS) Location tracking Two-Tier Scheme steps location update 5 Id LS MU New-ls Id MSS MU New-mss - HLS 6 10 4 9 New-ls Old-ls 3 Id HLS MU HLS - 2 7 8 New-mss 1 37 MU Mobile Database Systems (MDS) A Reference Architecture (Client-Server model) PSTN DB DB DBS DBS HLR VLR MSC MSC BSC BSC Fixed host Fixed host BS MU MU MU BS BS MU MU 38 Data Processing Issues Processing at the Server Processing at the Client Update Propagation and Installation Consistency Management Less Serious: – Concurrent Transactions – Client Data Recovery 39 40 Database Issues in Mobile Computing Query and Transaction Processing Replication Management Location Management Limitations – – – – Data Distribution, Mobility Management and Scalability Role of wireless medium in info distribution Dealing with short battery life Dealing with prolonged disconnection Periods – Bandwidth Management 41 Mobility Management and Scalability Location management Changing topologies Handoffs Resource finding Replication Resource sharing 42 Bandwidth Management Clients assumed to have weak and/or unreliable communication capabilities Broadcast--scalable but high latency On-demand--less scalable and requires more powerful client, but better response Client caching allows bandwidth conservation 43 Energy Management Battery life expected to increase by only 20% in the next 10 years Reduce the number of messages sent Doze modes Power aware system software Power aware microprocessors Indexing wireless data to reduce tuning time 44 Large Impact Distributed data management Querying wireless data Handling/representing fast-changing data Scale Tariff-driven query optimization Security User interfaces 45 Query Processing New Issues – Energy Efficient Query Processing – Location Dependent Query Processing Old Issues - New Context – Cost Model 46 Location Management New Issues – Tracking Mobile Users Old Issues - New Context – Managing Update Intensive Location Information – Providing Replication to Reduce Latency for Location Queries – Consistent Maintenance of Location Information 47 Transaction Processing New Issues – Recovery of Mobile Transactions – Lock Management in Mobile Transaction Old Issues - New Context Extended Transaction Models – Partitioning Objects while Maintaining Correctness 48 Dissemination-based Data Delivery Using Broadcast Disks Broadcast Disk Proposes a mechanism called Broadcast Disks to provide database access to mobile clients. Server continuously and repeatedly broadcasts data to a mobile client as it goes by. Multiple disks of different sizes are superimposed on the broadcast medium. Exploits the client storage resources for caching data. 50 Server Broadcast Programs Data server must construct a broadcast “program” to meet the needs of the client population. Server would take the union of required items and broadcast the resulting set cyclically. Single additional layer in a client’s memory hierarchy - flat broadcast. In a flat broadcast the expected wait for an item on the broadcast is the same for all items. Server Broadcast Programs 52 Server Broadcast Programs Broadcast Disks are an alternative to flat broadcasts. Broadcast is structured as multiple disks of varying sizes, each spinning at different rates. Server Broadcast Programs Flat Broadcast Skewed Broadcast Multi-disk Broadcast Server Broadcast Programs For uniform access probabilities a flat disk has the best expected performance. For increasingly skewed access probabilities, non-flat disk programs perform better. Multi-disk programs perform better than the skewed programs. Server Broadcast Programs Generating a multi-disk broadcast Number of disks (num_disks) determine the number of different frequencies with which pages will be broadcast. For each disk, the number of pages and the relative frequency of broadcast (rel_freq(i)) are specified. Server Broadcast Programs 57 Client Cache Management Improving the broadcast for one probability access distribution will hurt the performance of other clients with different access distributions. Therefore the client machines need to cache pages obtained from the broadcast. 58 Client Cache Management With traditional caching clients cache the data most likely to be accessed in the future. With Broadcast Disks, traditional caching may lead to poor performance if the server’s broadcast is poorly matched to the clients access distribution. 59 Client Cache Management In the Broadcast Disk system, clients cache the pages for which the local probability of access is higher than the frequency of broadcast. This leads to the need for cost-based page replacement. 60 Client Cache Management One cost-based page replacement strategy replaces the page that has the lowest ratio between its probability of access (P) and its frequency of broadcast (X) - PIX PIX requires the following: 1 Perfect knowledge of access probabilities. 2 Comparison of PIX values for all cache resident pages at cache replacement time. 61 Client Cache Management Another page replacement strategy adds the frequency of broadcast to an LRU style policy. This policy is known as LIX. LIX maintains a separate list of cacheresident pages for each logical disk Each list is ordered based on an approximation of the access probability (L) for each page. A LIX value is computed by dividing L by X, the frequency of broadcast. The page with the lowest LIX value is replaced. 62 Prefetching An alternative approach to obtaining pages from the broadcast. Goal is to improve the response time of clients that access data from the broadcast. Methods of Prefetching: Tag Team Caching Prefetching Heuristic 63 Prefetching Tag Team Caching - Pages continually replace each other in the cache. For example two pages x and y, being broadcast, the client caches x as it arrives on the broadcast. Client drops x and caches y when y arrives on the broadcast. 64 Prefetching Simple Prefetching Heuristic Performs a calculation for each page that arrives on the broadcast based on the probability of access for the page (P) and the amount of time that will elapse before the page will come around again (T). If the PT value of the page being broadcast is higher than the page in cache with the lowest PT value, then the page in cache is replaced. 65 Read/Write Case With dynamic broadcast there are three different changes that have to be handled. 1 Changes to the value of the objects being broadcast. 2 Reorganization of the broadcast. 3 Changes to the contents of the broadcast. 66 Conclusion Broadcast Disks project investigates the use of data broadcast and client storage resources to provide improved performance, scalability and availability in networked applications with asymmetric capabilities. 67 68 Mobility System conguration is no longer static: the center of activity, the topology, the system load, and locality, change dynamically need to search to locate objects various forms of heterogeneity 69 Wireless Communications offer less bandwidth more expensive less reliable Consequently, connectivity is weak and often intermittent 70 Portable Devices light and small to be easily carried around Such considerations, in conjunction with a given cost and level of technology ) mobile elements with less resources (e.g., memory, screen size and disk capacity) reliance on battery can be more easily accidentally damaged, stolen, or lost, thus, less secure and reliable 71 Mobile units are still characterized as: unreliable and prone to hard failures, i.e., theft, loss or accidental damage, resource-poor relative to static hosts. Examples: InfoPad [16] and ParcTab [28] projects 72 Adaptability A mobile system is presented with resources of varying number and quality: Connectivity conditions vary from total disconnections to full connectivity Available resources are not static either, for instance a docked" mobile computer may have access to a larger display or memory. the location of mobile elements changes and so does the network conguration and the center of computational activity 73 Example: during disconnection, a mobile host may work autonomously, while during periods of strong connectivity, depend heavily on the xed network sparing its scarce local resources 74 Disconnections: disconnected operation autonomous operation of a mobile host during disconnection. Weak connectivity: Operation should be tuned for communication environments characterized by low bandwidth, high latency, and expensive prices. Mobility: Basic support such as as establishing new communication links as well as advanced support such as migrating executing processes and database transactions in progress. Failure recovery: Since mobile elements are prone to hard failures, methods for failure handling and recovery are important. 75 Data Dissemination by Broadcast Pull-based data delivery or on demand data delivery: A client explicitly requests data items from the server. Push-based data delivery: The server repetitively broadcasts data to a client population without a specic request. Clients monitor the broadcast and retrieve the data items they need as they arrive. 76 Applications: Dissemination-based: information feeds such as stock quotes and sport tickets, electronic newsletters, mailing lists, traffic and weather information systems, cable TV on the Internet Commercial Products for example: the AirMedia's Live Internet broadcast network [6] Hughes Network Systems' DirectPC [26] Teletext and Videotex systems [11, 28] 77 The Datacycle project [16] at Bellcore: a database circulates on a high bandwidth network (140 Mbps). Users query the database by ltering information via special massively parallel transceivers. The Boston Community Information System (BCIS) [18]: broadcast news and information over an FM channel to clients with personal computers equipped with radio receivers 78 Hybrid Delivery Push vs Pull Push suitable when information is transmitted to a large number of clients with overlapping interests the server saves several messages the server is prevented from being overwhelmed by client requests. Push is scalable: performance does not depend on the number of clients Pull cannot scale beyond the capacity of the server or the network. In push, access is only sequential; Thus, access latency degrades with the volume of data In pull, clients play a more active role 79 Hybrid Delivery clients are provided with an uplink channel, called backchannel, to send messages to the server. Sharing the channel : if the same channel is used for both broadcast delivery and for the transmission of the replies to on demand requests 80 Use of the backchannel - to provide feedback and prole information to the server - to directly request data Which pages? to avoid overwhelming the server Page i not in cache and the number of items scheduled to appear before i on the broadcast is greater than a threshold parameter 81 Selective Broadcast Broadcast an appropriately selected subset of items and provide the rest on demand In [25], the broadcast is used as an air-cache for storing frequently requested data. The broadcast content continuously adjusts to match the hot-spot of the database. The hot-spot is calculated by observing broadcast misses indicated by explicit requests for data not on the broadcast. In [19]: the database is partitioned into: a \publication group" that is broadcast and an \on demand" group. The criterion for partitioning is to minimize the backchannel requests while constraining the response time below a predened upper limit. 82 On Demand Broadcast the server chooses the next item to broadcast on every broadcast tick based on the requests for data it has received Various strategies [28]: broadcast the pages in the order they are requested (FCFS), or the page with the maximum number of pending requests. A parameterized algorithm for large-scale data broadcast based only on the current queue of pending requests [7 83 Organization of Broadcast Data Access time: average time elapsed from the moment a client expresses its interest to an item to the receipt of the item on the broadcast channel Tuning time: the amount of time spent listening to the broadcast channel Organize the broadcast to minimize access and tuning time 84 Efficient Concurrency Control for Broadcast Jayavel Shanmugasundaram Environments Arvind Nithrakashyap Rajendran Sivasankaran 85 Outline Broadcast environments Inapplicability of existing techniques Suitable correctness criterion Mechanisms Performance Results Conclusion 86 Why Broadcast Data? Millions of clients that need to see current and consistent data Server handling all client requests ==> not scalable More scalable solution: Periodically broadcast all data items Clients read items off broadcast Datacycle [Herman], Broadcast Disks [Acharya] 87 Example: eAuctions Numerous potential clients Only a small fraction contact server to offer bids Need access to current and consistent data 88 Broadcast Environment Characteristics Large number of clients Mobile clients with scarce power resource ==> Low client to server bandwidth Plentiful server to client bandwidth ==> Asymmetric communication medium 89 Mutually Consistent Reads TrBegin R(x) R(y) R(z) TrEnd time (broadcast cycles) Are x, y, and z mutually consistent? 90 Outline Broadcast environments Inapplicability of existing techniques Suitable correctness criterion Mechanisms Performance Results Conclusion 91 Why Not Traditional CC Techniques? Approach 1: Dynamic conflict resolution – Excessive communication – e.g., locking: • acquiring read locks by client transactions • server swamped with lock requests • client uses precious uplink bandwidth Approach 2: Avoid potential serializability conflicts 92 Schedules Server ClientA W2(x) C2 W4(y) C4 R1(x) R1(y) time 93 Serialization Orders Server W2(x) C2 W4(y) C4 T2 T4 ClientA R1(x) R1(y) T4T1T2 94 Serialization Orders Server W2(x) C2 W4(y) C4 T2 T4 ClientA R1(x) R1(y) T4T1T2 ClientB R3(x) R3(y) Even if ClientB does not exist, ClientA will have to abort transaction T1 95 Serializability? Serializability - a global property All read only transactions: – Required to see same serial order of update transactions, even if executing at different clients – Required to be serializable w.r.t. all update transactions, even if updates do not affect values read Inappropriate for broadcast environments 96 Outline Broadcast environments Inapplicability of existing techniques Suitable correctness criterion Mechanisms Performance Results Conclusion 97 Broadcast Data Requirements Mutual consistency – server maintains mutually consistent data – clients read mutually consistent data Currency – clients see data that is current 98 A Sufficient Criterion • All update transactions are serializable. • Each read-only transaction is serializable with respect to the update transactions it (directly or indirectly) reads from. Server W2(x) C2 W4(y) C4 T2T4 ClientA R1(y) R1(x) T4T1 ClientB T2T3 R3(x) R3(y) 99 A Sufficient Criterion • All update transactions are serializable. • Each read-only transaction is serializable with respect to the update transactions it (directly or indirectly) reads from. external consistency [Weihl 87] update consistency [Bober and Carey 92] 100 Implications Decoupled correctness criterion – Clients need not communicate with server or other clients Weaker correctness criterion – Reduces unnecessary aborts 101 Outline Broadcast environments Inapplicability of existing techniques Suitable correctness criterion Mechanisms Performance Results Conclusion 102 The Algorithm F-Matrix Server functionality Client functionality Nature of Control Information – broadcast by the server with the data – helps clients determine consistency of reads Client read-only validation protocol 103 Server Functionality Ensures conflict serializability of update transactions Broadcasts during each cycle – Committed values of data items at start of cycle – Control matrix Incrementally maintains control matrix as updates occur 104 Client Functionality Read consult control information transmitted during that cycle to determine whether the read operation can proceed if read operation cannot proceed the transaction is aborted. Write Commit performed on a local copy of the data item in the client. no checks are made update tr : (write set + values) along with (read set + cycle numbers) sent to server read tr : commit succeeds 105 Control Matrix: Intuition W2(x) C2 Server Client R1(x) Server Client W4(y) C4 R1(y) W2(x) C2 R1(x) T is currently reading y T had read x Did any tr that affected y R4(x) W4(y) C4 R1(y) change x after T read it? 106 Control Matrix Objects: n objects all initialized at cycle 0 C: n x n control matrix C(x,y) = max( cycle in which T commits ), where T affects the latest committed value of y and also writes to x 107 Precond. for Consistent Reads T previously read x from broadcast cycle b RT = set of (x ,b) pairs C is the matrix at the beginning of current cycle read y iff read-condition(y) holds: forall (x,b) in RT, C(x,y) < b i.e., no transaction that affected y wrote x after t read x 108 Smaller Control Matrix Partition objects into groups Control matrix: n x numgroups SC(x,s) = max y in s C(x, y) Updating an object in s = update to any object in s Fewer entries to transmit compared to C group 1 read-condition(y): forall (x , b) in RT group2 SC(i , s) < b T is currently reading y T had read x No tr that affected any object in y ‘s group changed x after T read it 109 Group Size Increasing size of group => more unnecessary conflicts Reducing size of group => increased control information overhead. – n groups => F-Matrix – one group => Datacycle • achieves serializability Read-condition for Datacycle : no previously read object has been updated 110 R-Matrix To achieve Mutual Consistency Read condition: objects previously read have not been updated by other transactions or the object being read has not been updated since the beginning of the transaction 111 Outline Broadcast environments Inapplicability of existing techniques Suitable correctness criterion Mechanisms Performance Results Conclusion 112 Effect of Client Tr. Length F-Matrix -- has best perf. -- scales very well Datacycle R-Matrix F-Matrix F-Matrix-ideal 113 Summary of Results F-Matrix > R-Matrix > Datacycle – Weaker abort condition leads to better response times F-Matrix is highly scalable with respect to – Client/Server transaction length – Server transaction rate – Number of Objects/Size of Objects R-Matrix better only at very small object sizes In many cases F-Matrix is very close to F-Matrix-ideal 114 Conclusion Need for mutual consistency + currency Efficient mechanism - F-matrix R-matrix is a low overhead alternative F-matrix delivers! In Paper: Caching to exploit weak currency requirements 115 Examples Data Server Business data, e.g., Vitria, Tibco Election coverage data Stock related data Traffic information Electronic auctions 117