* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download LN29 - WSU EECS
Survey
Document related concepts
Transcript
CPT-S 580-06
Advanced Databases
Yinghui Wu
EME 49
ADB (ln29)
1
CPT-S 580-08 Advanced Databases
DBMS: privacy and security in the Cloud
Data security and privacy
Security and privacy in cloud
Data confidentiality
Research Challenges
ADB (ln29)
adapted from “Secure and Privacypreserving database services in the cloud,
Divy Agrawal, et.al, ICDE 2013 tutorial”
Database systems: security & privacy issues
ADB (ln29)
Access Control [Bertino et al. TDSC’05]
Problem Statement: authorizing data access scopes (relations,
attributes, tuples) to users of DBMS
Discretionary access control
– Authorization administration policies, ie, granting and
revoking authorization (centralized, ownership, etc)
– Content-based using views and rewriting for fine-grained
access control
– Role-based access control: a function with a set of actions,
consisting of users members
Mandatory access control:
– Object and subject classification (eg, top secret, secret,
unclassified, etc).
4
Data Anonymization
Problem: protecting Personally Identifiable Information (PII) and their
sensitive attributes
Quasi-identifier
Sensitive
DOB
Gender
Zipcode
Disease
1/21/76
Male
53715
Heart Disease
4/13/86
Female
53715
Hepatitis
2/28/76
Male
53703
Brochitis
1/21/76
Male
53703
Broken Arm
4/13/86
Female
53706
Flu
2/28/76
Female
53706
Hang Nail
Quasi-identifiers
need to be
generalized
or suppressed
Quasi-identifiers are sets of attributes that can be linked
with external data to uniquely identify an individual
5
Solution: k-Anonymity
[Samarati et al. TR’98]
Quasi-identifiers indistinguishable among k individuals
Implemented by building generalization hierarchy or partitioning
multi-dimensional data space
Equivalence
Homogeneity
attack class
share same QI
Background
knowledge attack
6
Enhanced Solution: l-Diversity
[Machanavajjhala et al. ICDE’06]
• At least l values for sensitive attributes in each equivalence
class
Similarity attack
A 3-diverse patient table
Zipcode
Age
Salary
Disease
476**
2*
20K
Gastric Ulcer
476**
2*
25K
Gastritis
476**
2*
30K
Stomach Cancer
4790*
≥40
50K
Gastritis
4790*
≥40
100K
Flu
4790*
≥40
70K
Bronchitis
476**
3*
60K
Bronchitis
476**
3*
80K
Pneumonia
476**
3*
90K
Stomach Cancer
Skewness attack
7
Enhanced Solution: t-Closeness
[Li et al. ICDE’07]
•
Distance between overall distribution of sensitive attribute values and
distribution of sensitive attribute values in an equivalence class
bounded by t
8
Differential Privacy for Statistical Data
[Dwork ICALP’06]
Strong privacy guarantees while querying a database
Query
P(A)
A
Indistiguishable!
PERTURBATION
Query
P(A’)
A’
PERTURBATION
A randomized function K gives ε-Differential Privacy IFF for all datasets
D1 and D2 differing on at most one element, and all S Î Range (K)
ln
Pr[K(D1 ) Î S]
£e
Pr[K(D2 ) Î S]
9
Secure Devices for Privacy
[Anciaux et al. SIGMOD’07]
Problem: protecting private data during queries involving both private (hidden) and
public (visible) data
Solution: carry private data in a secure USB key, ensure private data never leaves
the USB key, and only public data flows to the key
Query optimization for small RAM USB key
4/11/2013
ICDE 2013 Tutorial
10
Database security & privacy in the cloud
ADB (ln29)
Cloud – A Tempting Attack Target
Why the cloud?
– Ubiquitous access to consolidated data.
– Shared infrastructure economies of scale
– A lot of small and medium businesses
Why attack?
– Target one service provider, attack multiple companies
– Financial gain from trading sensitive information
12
Cloud Provides Novel Attack Opportunities
Co-residence attack [Ristenpart et al. CCS’09]
– Adversary: non-provider-affiliated malicious parties
– Map and identify location of target VM
– Place attacker VM co-resident with target VM
– Cross-VM side-channel attacks (due to sharing of physical
resources): eg, number of visitors to a page, or keystroke attacks
for password retrieval.
Signature wrapping attack
–
–
–
–
[Somorovsky et al. CCSW’11]
Control Interface compromise by capturing a SOAP msg.
Manipulate SOAP message with arbitrary XML fragments
Use XML signature vulnerability to pass authentication
Take control of a victim’s account
13
A Barrier to Conquer
Security and privacy – a barrier
to cloud adoption
Data (sensitive data) – a key
concern
need to solve data security and
privacy problems in the cloud
14
Problems Amplified by the Cloud
Data confidentiality
– Attacks
• Unauthorized accesses,
side channel attacks
– Solutions
• Encryption, querying
encrypted data
• Trusted computing
• Access privacy
– Attacks
• Inferences on access
patterns or query results
– Solutions
• Private information
retrieval
• Query obfuscation
Query
Data
Answer
User
Cloud Servers
15
Challenges: Conflicting Goals
High
Existing
Services
Ideal State
Functionality
Performance
Many Crypto
Systems/Protocols
Low
Confidentiality / Privacy
High
16
Data confidentiality
ADB (ln29)
Database as a Service
[Hacigümüs et al. ICDE’02]
Protects data from steeling but plaintext data can still be seen
on the server
Write – encrypt before storing
– insert into lineitem (discount) values (encrypt(10,key))
Read – decrypt before access
– select decrypt(discount,key) from lineitem where custid =
300
Encryption alternatives
– Software level v.s. Hardware level (cryptographic
coprocessor) encryption
– Granularity: field, row, page
18
Partition and Identification Index
[Hacigümüs et al. SIGMOD’02]
E(tuple): encrypted-tuple, {attribute-index}
Attribute-index: attribute value partition ids
2
0
7
200
5
400
1
600
4
800
1000
19
Partition and Identification Index
Client knows a map function, Map(val) = id of the partition
containing val
Random mapping
2
0
7
5
400
200
1
4
800
600
1000
Order-preserving mapping
1
0
2
200
4
400
5
600
7
800
1000
20
Mapping Predicate Conditions
• Map(< val) : ids of the partitions that could contain values < val
• E.g. Map(eid < 280) = {2, 7} for random mapping
• Map(> val) : ids of the partitions that could contain values > val
• Map(Ai = Aj): pairs of ids of the partitions that could have equal
Ai and Aj values
• Decryption and processing on the client
21
Mapping Predicate Conditions
emp.did = mrg.did
22
Partition / Bucketization Review
Pros
– Efficient computation on the server
Cons
– Data update is hard (may need re-distribution)
– Filtering super answer set could be time consuming
depending on the partitions sizes
– Might reveal value distribution from relative partitions
changes during dynamic data updates
23
CryptDB [Popa et al. SOSP’11]
Supports a wide range of SQL queries over encrypted data
Server fully evaluates queries on encrypted data, and client does
not perform query processing
SQL-aware encryption
– leverage provable practical techniques for different SQL
operators over encrypted data
Adjustable query-based encryption
– Dynamically adjust the encryption level of data items according
to user’s queries
Onion of encryptions
– From weaker forms of encryption that allow certain computation
to stronger forms of encryption that reveal no information
24
SQL-Aware Onion Encryption
RND: no functionality
DET: equality selection
SEARCH: word selection
(only for text fields)
JOIN: equality join
RND: no functionality
OPE: comparison
OPE-JOIN: inequality join
Any value
Any value
HOM: sum
int value
25
CryptDB System
For sending certain onion layer key
For performing cryptographic operations
26
Open problems
ADB (ln29)
Open Research Problems
Encryption for processing range/join database queries on
encrypted data
Improve performance of querying encrypted data for use in
practical OLTP applications
– Pre-computation
– Parallel calculation
End to end security in the cloud
– Need information flow control and auditing in addition to
cryptography or trusted computing based approaches
28
Concluding Remarks
Cloud security and privacy is not a completely new problem.
Some issues are amplified by the cloud.
Protecting data confidentiality and access privacy
Maintaining practical functionality and performance while
achieving security and privacy
29
References
•
•
•
[Bertino et al. TDSC’05] E. Bertino et al. Database security-concepts, approaches, and
challenges. In IEEE TDSC, 2(1), 2005.
[Samarati et al. TR’98] P. Samarati et al. Protecting privacy when disclosing information: kanonymity and its enforcement through generalization and suppression. TR 1998.
[Machanavajjhala et al. ICDE’06] A. Machanavajjhala et al. l-diversity: privacy beyond kanonymity. In ICDE 2006.
[Li et al. ICDE’07] N. Li et al. t-closeness: privacy beyond k-anonymity and l-diversity. In
ICDE 2007.
[Dwork ICALP’06] C. Dwork. Differential privacy. In ICALP(2) 2006.
[Verykios et al. SIGMOD’04] V. S. Verykios et al. State-of-the-art in privacy preserving data
mining. In SIGMOD 2004.
[Agrawal et al. SIGMOD’00] R. Agrawal et al. Privacy-preserving data mining. In
SIGMOD 2000.
[Clifton et al. KDD’02] C. Clifton et al. Tools for privacy preserving distributed
data mining. In KDD 2002.
[Anciaux et al. SIGMOD’07] N. Anciaux et al. GhostDB: querying visible and
hidden data without leaks. In SIGMOD 2007.
30
References
[Chaudhuri et al. CIDR’11] S. Chaudhuri et al. Database access control &
privacy: is there a common ground? In CIDR 2011.
[Ristenpart et al. CCS’09] T. Ristenpart et al. Hey, you, get off of my cloud:
exploring information leakage in third-party compute clouds. In CCS 2009.
[Somorovsky et al. CCSW’11] J. Somorovsky et al. All your clouds are belong to
us: security analysis of cloud management interfaces. In CCSW 2011.
[Hacigümüs et al. ICDE’02] H. Hacigümüs et al. Providing database as a
service. In ICDE 2002.
[Song et al. S&P’00] D. Song et al. Practical techniques for searches on
encrypted data. In S&P 2000.
[Hacigümüs et al. SIGMOD’02] H. Hacigümüs et al. Executing SQL over
encrypted data in the database service provider mode. In SIGMOD 2002.
[Hore et al. VLDB’04] B. Hore et al. A privacy-preserving index for range
queries. In VLDB 2004.
[Agrawal et al. SIGMOD’04] R. Agrawal et al. Order preserving encryption for
numeric data. In SIGMOD 2004.
31
References
[Popa et al. SOSP’11] R. A. Popa et al. Cryptdb: protecting confidentiality with encrypted
query processing. In SOSP 2011.
[Damiani et al. CCS’03] E. Damiani et al. Balancing confidentiality and efficiency in
untrusted relational DBMSs. In CCS 2003.
[Wang et al. SDM’11] S. Wang et al. A comprehensive framework for secure query
processing on relational data in the cloud. In SDM 2011.
[Aggarwal et al. CIDR’05] G. Aggarwal et al. Two can keep a secret: a distributed
architecture for secure database services. In CIDR 2005.
[Emekci et al. ICDE’06] F. Emekci et al. Privacy preserving query processing using third
parties. In ICDE 2006.
[Agrawal et al. SRDS’88] D. Agrawal et al. Quorum consensus algorithms for secure and
reliable data. In SRDS 1988.
[Bajaj et al. SIGMOD’11] S. Bajaj et al. Trusteddb: a trusted hardware based database with
privacy and data confidentiality. In SIGMOD 2011.
[Song et al. IEEE’12] D. Song et al. Cloud data protection for the masses. In IEEE
Computer, 45(1), 2012.
[Chor et al. JACM’98] B. Chor et al. Private information retrieval. In J. ACM, 45(6), 1998.
32
References
[Kushilevitz et al. FOCS’97] E. Kushilevitz et al. Replication is not needed:
single database, computationally private information retrieval. In FOCS
1997.
[Sion et al. NDSS’07] R. Sion et al. On the computational practicality of
private information retrieval. In NDSS 2007.
[Olumofin et al. FC’11] F. G. Olumofin et al. Revisiting the computational
practicality of private information retrieval. In FC 2011.
[Williams et al. NDSS’08] P. Williams et al. Usable private information
retrieval. In NDSS 2008.
[Wang et al. DBSEC’10] S. Wang et al. Generalizing PIR for practical
private retrieval of public data. In DBSec 2010.
[Wang et al. DAPD’13] S. Wang et al. Towards practical private processing
of database queries over public data. In DAPD 2013.
[Vimercati et al. ICDCS’11] S. D. C. Vimercati et al. Efficient and private
access to outsourced data. In ICDCS 2011.
33