* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Presented - Michigan State University
Survey
Document related concepts
Transcript
Privacy and Integrity Preserving in
Distributed Systems
Presented for Ph.D. Qualifying Examination
Fei Chen
Michigan State University
August 25th, 2009
Introduction
Data collection and publishing is a core operation in
many distributed systems
Outsourced database systems
• Organizations outsource their databases to service providers
• Organizations can focus on their core tasks without considering
the management of their database
Two-tiered wireless sensor networks
• Storage nodes gather data from nearby sensors and process
queries from the sink
• Power and storage saving for sensors as well as the efficiency of
query processing
Outsourced database systems
An outsourced database system
Query
Customer
Result
Database
Database
Organizations
Outsourced
Database
Service Provider
Query
Result
Customer
Query
Result
Customer
Outsourcing databases offers many advantages
Significantly reduce the management cost of organizations
Service Providers have higher bandwidths and lower latencies
Having multiple service providers helps to avoid the organizations
being a single point of failure
Two-tired sensor networks
A two-tired sensor network
Sensor
Sensor
Data
Data
Query
Result
Data
Storage Node
Data
Sensor
Sensor
Benefits
Power saving for sensors
Memory saving for sensors
Query processing becomes more efficient
Sink
Comparison between two distributed systems
Similarity
Three common parties
• Data owners, i.e., organizations and sensors
• Data publishers, i.e., service providers and storage nodes
• Users, i.e., customers and the sink
The two distributed systems can be modeled as
Data
Outsourced
Data
Query
Result
User
Data Owner
Data Publisher
• There may be multiple publishers
• For outsourced database systems, there may be multiple users
• For two-tiered sensor networks, there are multiple data owners
Difference
For outsourced database systems, users may not be fully trusted by data owners
For two-tiered sensor networks, the sink is fully trusted by sensors
Security Challenges
Due to the important role of data publishers, there
are two security challenges
Preserve privacy of the data stored in a data publisher
Encrypted
Data
Data
Data Owner
Outsourced
Data
Query
Result
User
Data Publisher
Untrusted
How can a data publisher search the query result over the encrypted data?
Security Challenges
Preserve integrity of a query result from a data publisher
Untrusted
Database
Data
Data Owner
Outsourced
Data
Query
Result
User
Data Publisher
manipulate results
(1) Forge data
(2) Return portion
of the result
How can we prevent the misbehavior of data publishers?
Problem Statement
Design the storage scheme and query protocol in a
privacy and integrity preserving manner
Data and query privacy
• Publishers cannot figure out the original data
• Publishers cannot figure out queries
Queries over data
• Data publishers can search query results over the encrypted data,
e.g., range queries.
Query result integrity
• Users can detect whether a query result contains forged data or
misses some legitimate data.
Efficiency
• e.g. communication and computation cost
The Proposed Approaches
To preserve the privacy of the data, the data owner encrypts the data
To enable the searching operation for data publishers, the data owner encodes
the private data in a format which supports the searching operation
To preserve the integrity of query results, the data owner computes verification
objects (VOs) for all possible queries
Let {t1, t2, …, tm} denote the data of a data owner, the basic idea is illustrated
{encrypt(t1), …, encrypt(tm)}
{t1, …, tm}
User
Data Publisher
Data Owner
search(t1, …, tm)
VOs(t1, …, tm)
search(query)
encrypt(ti1), …, encrypt(tig)
query
VO(ti1, …, tig)
Previous Work
Outsourced database systems
Preserving Privacy
• Bucket Partition [Hacigumus et al., SIGMOD 2002]
• A Public-key system [Boneh and Waters, TCC 2007]
Preserving Integrity
• Merkle hash trees [Devanbu et al., Journal of Computer Security 2003]
• Signature aggregation and chaining techniques [Narasimha and Tsudik,
DASFAA 2006]
• Spatial data structures [Chen et al., ESORICS 2008]
Two-tiered sensor networks
• S&L scheme [Infocom 2008]
• The optimized version of S&L scheme [Infocom 2009, Mobihoc 2009]
Privacy in outsourced database systems
Bucket Partition [Hacigumus et al., SIGMOD 2002]
Data owner
(Key Ki )
User
(Key Ki )
Data Publisher
Outsourced
Database
Database
{2,5,9,15,20,23,34,40}
{2,5,9} Ki
0
Bucket ids : 1
[35,45]
{15,20,23} Ki {34,40} Ki
12
32
2
40
3
50
4
3, 4
Result: {34,40} Ki
Return more data
Privacy in outsourced database systems
Bucket Partition
Drawbacks
A query result may have false positive errors
It allows data publishers to obtain a reasonable estimation
on the actual value of data items and queries
Privacy in outsourced database systems
A Public key system [Boneh and Waters, TCC 2007]
Hidden Vector Encryption
• Using bilinear groups to produce tokens for searching conjunctive,
subset, and range queries on an encrypted database.
Drawback
Computationally expensive
• Public key cryptography
• Require a database owner to perform O(zD) encryption for each
tuple, where z is the number of dimensions and D is the domain
size
Integrity in outsourced database systems
Merkle hash trees [Devanbu et al., Journal of Computer
Security 2003]
H18=h (H14|H58)
H14=h(H12|H34)
H12=h(H1|H2)
H1=h((d1)ki) H
1
H18
H14
H58
H12
H34
H2
H3
(d1)ki (d2)ki (d3)ki
H56
H4
H5
H78
H6
H7
H8
(d4)ki (d5)ki (d6)ki (d7)ki (d8)ki
Integrity in outsourced database systems
Merkle hash trees
H18
H14
H58
H12
H34
H1
H2
H3
(2)ki
(5)ki
(9)ki
H56
H4
H5
H78
H6
(15)ki (20)ki (23)ki
H7
H8
(34)ki (40)ki
Query [10, 30]
Query result
Verification object
Integrity in outsourced database systems
H18
H14
H58
H12
H34
H1
H2
(2)ki
(5)ki
H3
H56
H4
H5
H78
H6
H7
H8
(9)ki (15)ki (20)ki (23)ki (34)ki (40)ki
Query [10, 14]
Query result
Verification object
Drawbacks
A query result has false positive errors
It is hard to extend Merkle hash trees to verify the integrity for multidimensional data
Integrity in outsourced database systems
Signature Aggregation and Chaining
It aggregates multiple individual signatures into one unified signature
• Verifying the unified signature is equivalent to verifying all individual signatures
It presents a signature chain that links a signature of a data item with
the signatures of the data item’s neighbors
The signature of t5 is Sig t5 hht5 | ht6 | ht2 | ht7 k
Drawbacks
A query result has false positive errors
It is computationally expensive to verify the integrity of multidimensional data
Integrity in outsourced database systems
Spatial Data Structures [Chen et al., ESORICS 2008]
Integrity in outsourced database systems
Chen et al. proposed a Canonical Range Tree (CRT) to count the
number of data items in access control areas and query spaces.
Advantages
No false positive errors. Do not need to provide the boundary data items.
It can be used to perform access control
Drawbacks
Only can be applied for range queries, while SQL includes other types of
queries
Privacy and Integrity in Two-tiered sensor
networks
S&L scheme [Infocom 2008]
Sensor Si
(Key Ki )
Storage Node
Data
Sink
(Key Ki )
Query
[9,10]
{1, 4, 5, 7, 9}
{1,4} Ki {5} Ki {7, 9} Ki h(i||4||t||Ki)
0
Bucket IDs:1
4
5
2
9
3
Two major drawbacks
10
4
3, 4
{7, 9} Ki 7 is out of the range
Result:
h(i||4||t||Ki) Prove empty bucket
Fairly accurate estimating data items and quires
Power and space consumption grows exponentionally
with the number of dimensions.
Privacy and Integrity in Two-tiered sensor
networks
Optimized version of S&L scheme
For one-dimensional data [Infocom 2009]
• Embed relationships among data collected by each sensor
• Define a vector where each bit indicates whether the node has data in the
corresponding bucket or not
3
{3, 1110}Ki
18
{18, 1110} Ki
Sensor 2
V1
Storage Node
2,5,9
Bucket Vector V1: 1
Sensor 1
15,20,23
1
V1
34,40
1
0
Sensor 3
Privacy and Integrity in Two-tiered sensor
networks
For Multi-dimensional data [Mobihoc 2009]
Sensor 2
1
10
21
30
2
5
(12,5)ki (15,6)ki
(23,4)ki
(45,3)ki
46
50
7
011
V1= 0 1 0
010
000
V1
Sensor 1
Storage Node
V1
Sensor 3
These two schemes are less secure than S&L’s scheme
They inherit the same weakness of allowing storage nodes to
estimate the original data and queries
The optimization technique allows a compromised sensor to easily
compromise the integrity verification functionality of the network
• Send falsified bit maps to sensors and storage nodes.
Future Research Directions
For outsourced database systems
No complete solutions of preserving privacy and integrity
for outsourced database systems
Preserving privacy and integrity for multi-dimensional data
is not well studied
For two-tiered sensor networks
Prevent a storage node from estimating data and queries
Multi-dimensional data
Efficiency
Questions
Thank you!