* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Uncircumventable Data Privacy Policies
Serializability wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Access wikipedia , lookup
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Ingres (database) wikipedia , lookup
Functional Database Model wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational model wikipedia , lookup
Database model wikipedia , lookup
Uncircumventable
Privacy Policies
Arvind Narayanan
Vitaly Shmatikov
The University of Texas at Austin
Outsourced Customer Support
Database
…
1010011
0100101
1111000
…
“Answer our customers’ questions,
but do NOT download the entire list
of their social security numbers”
What Does NOT Work (1)
Database
…
1010011
0100101
1111000
…
DRM / tamper-proof systems
have distressing track record
“Tamper-proof” access control
system blocks forbidden queries
What Does NOT Work (2)
Database
…
1010011
0100101
1111000
…
But user must be able to
answer questions about
specific records
Randomize database records
(cf. privacy-preserving data mining)
NSA Phonebook
John Q. Spook
Bob Ispy
Tom Carnivore
Bill Sigint
555-1212
987-6543
212-2121
GET-RUDE
spook@nsa.gov
ispy@nsa.gov
tk@nsa.gov
sigint@nsa.gov
We want the database to behave like a lookup oracle,
i.e., like a function lookup: Names Phones
Lookup(name) is easy to compute
Retrieving list of names or list of phones is infeasible
Retrieving phone if name is not known is infeasible
Why?
Usual notion of privacy: access
control using “credentials”
Our notion: retain control over
data after it has been released
Publish databases but prevent people
(e.g. spammers) from harvesting
information indiscriminately
Easy to do if a trusted entity
mediates every access to the data;
we want to achieve the same level of
security in a non-interactive setting
Big Picture
Data query attempt
Database
Allowed queries are easy
Disallowed queries are infeasible
X
Use cryptography to implement
data-in-a-box – “virtual black box”
1
0
1
1
1
0
0
1
1
0
0
1
1
0
1
0
0
1
0
0
1
1
1
0
…
010
101
000
011
101
000
…
1
1
0
0
0
1
1
1
0
1
1
1
1
0
1
0
1
0
0
1
0
0
0
1
Computationally infeasible:
no trusted third parties,
no access control software,
no ad-hoc data scrambling …
Our Objectives
Not secrecy of individual records
We want to scramble the database so that
queries not permitted by the policy are
“impossible” to evaluate
• Note: permitted queries may reveal a lot about
individual records
• This depends on the policy!
Obfuscation: “Virtual Black Box”
Data-in-a-box, code-in-a-box
• Data D, query Q = same as program P s.t. P(Q) = Q(D)
• Think of data is simply a special case of code
Study of putting code in a box: obfuscation
An obfuscated version of a program…
• Has the same output whp on all inputs (functionality)
• Runs roughly as fast as the original (efficiency)
• Reveals no more about the original program code than
does a black box implementing the function
(obfuscation)
– … assuming a computationally bounded adversary
Obfuscation: State of the Art
Ad-hoc obfuscation schemes tend to be broken
• No proofs of security, many successfully attacked
– Example: Boneh-Jacob-Felten attack on obfuscated DES
General-purpose obfuscation is impossible
• Barak et al. (CRYPTO 2001)
• No single obfuscator for all circuits
Special-purpose obfuscation
• Example: UNIX password hashes
• Obfuscation of “string equality”, a.k.a. “point function”
– fα(x) = { α == x ? 1 : 0 }
Obfuscation Examples
Point function
x
Decryption
x
H(x) =? β
α
α
where β = H(α)
Yes/No
(we don’t know how)
• Should work for every α
• Obfuscated circuit should reveal nothing about α
Dα(x)
Basic Approach: Simulatability
Define ideal functionality for obfuscated database
• Formalization of “privacy policy”
• Secure by definition!
• Describes permitted queries and/or access patterns
– What we want our database to look like (e.g., lookup function)
Define the obfuscation algorithm
Argue that no efficient adversary can tell the
difference between the obfuscated database and
a simulation in the ideal functionality
• Therefore, obfuscated database does not leak any
information beyond what’s given by ideal functionality
Simulatability
1
0
1
1
1
0
0
1
1
0
0
1
1
0
1
0
0
1
0
0
1
1
1
0
0
1
0
0
1
0
10
01
00
11
01
00
…
1
1
0
0
0
1
1
1
0
1
1
1
1
0
1
0
1
0
0
1
0
0
0
1
Obfuscated
database
Ideal
functionality
Original
database
obfuscator
?
1
0
0
1
0
0
1
1
0
1
0
1
1
1
1
0
1
0
1
0
1
0
0
0
0
1
0
0
1
1
…
00
00
01
01
00
01
…
(e.g., lookup function)
1
1
1
1
0
1
0
0
0
1
1
1
1
0
1
0
1
0
1
1
0
1
0
1
Fake
obfuscated
database
simulator
Secure by definition!
Cannot leak anything
that’s not permited by
ideal functionality
No probabilistic polynomial-time adversary should be able
to distinguish the simulation and the real obfuscated
database with more than negligible probability
Formal Definition (Lookup Only)
D is the database, i.e., list of (x,y) pairs
ID: XY is the ideal lookup functionality
• xX s.t. (x,y1) … (x,yn)D ID(x)={y1 … yn}, else
GD: XY is the obfuscation of D if
(1) Correct retrieval (allowed queries are feasible)
xX Prob( GD(x) ≠ ID(x) ) ≤ negl()
(2) Virtual black-box (disallowed queries infeasible)
PPT adversary A, PPT simulator S
| Prob(A(GD)=1) - Prob(SID(1ID)=1) | ≤ negl()
Discussion of the Definition
Indistinguishability from ideal functionality (IF) is
not always the same as intuitive “privacy”
• Some forms of access are permitted by IF
• The goal is not to hide individual data records, but to
control how they can be accessed
E.g., obfuscated phonebook is indistinguishable
from the lookup function: Names Phones
• It’s hard to find the phone if you don’t know the name
• Does not say that it’s hard to find the name for which
there is a phone in the database
Is this the right definition? Depends on application!
Construction (Lookup Only)
ith row of the original database
xi
r1
hash(r1,xi)
yi
r2
To learn xi from the
obfuscated database,
need to invert the
hash function
hash(r2,xi) yi
ith row of the obfuscated database
Easy simulatability proof in random oracle model
Access time is now linear in |D|
Group Privacy
Extracting one record is easy
• Legitimate account access
• Response to a customer request
Harvesting many records is hard
X
Database
1
0
1
1
1
0
• Harvesting of emails for spam
• Theft of financial information
• Unauthorized transaction monitoring
Inverse of the census problem
(allows access to individual records
but hides some global property)
0
1
1
0
0
1
1
0
1
0
0
1
0
0
1
1
1
0
…
010
101
000
011
101
000
…
1
1
0
0
0
1
1
1
0
1
1
1
1
0
1
0
1
0
0
1
0
0
0
1
Applications
Electronic directories: prevent malicious users
from harvesting information from the directory
Outsourced customer support: support clerk can
easily look up a record in response to a customer
request, but cannot steal data wholesale
Multi-institution drug trials: share encrypted test
subject records, reveal some of them later
• Revelation condition is not known in advance
– “Open records of all subjects with this group of symptoms”
• To prevent dictionary attacks, queries based on partial
information should take a long time to evaluate
Exponential Slowdown
Legitimate questions vs. mass harvesting
• Intuition: legitimate users know what they are looking
for and can describe it precisely
– “Give me the email of John Q. Public, born 1969”
• Abusers want all information indiscriminately
– “Give me the emails of all males under 50”
Idea: if N records satisfy user’s query, force user
to guess N bits to compute the answer
• Answer encrypted, user learns all but N bits of the key
What queries can be obfuscated in this way?
Simple Example
Name
YOB
Email
Smith
1949
smith28@hotmail.com
Brown
1952
root@getrude.com
Smith
1972
smith13@yahoo.com
Jones
1949
joebob@fishing.com
SELECT EMAIL WHERE NAME=“Smith”
SELECT EMAIL WHERE YOB=1949
• User can’t learn email without guessing 2 bits
SELECT EMAIL WHERE NAME=“Smith” AND YOB=1949
• User can’t learn email without guessing 1 bit
Obfuscation of a Small Database
Helps user verify that
he found the right row
r1 r2 r3 r4
H(r1,“Smith”)
Hidden key bits depend
on other database entries
H(r3,“1949”)
H(r2,“Smith”)(24)
q1 q2 q3 q4
H(q1,“Brown”)
H(p1,“Smith”)
H(s1,“Jones”)
H(1234)“smith28@hotmail.com”
H(q4,“1952”)(134)
H(12 3 4)“root@getrude.com”
H(p4,“1972”)(124)
H(1234)“smith13@yahoo.com”
H(s4,“1949”)(23)
H(1234)“joebob@fishing.com”
H(p3,“1972”)
H(p2,“Smith”)(24)
s1 s2 s3 s4
H(r4,“1949”)(23)
H(q3,“1952”)
H(q2,“Brown”)(134)
p1 p2 p3 p4
Random 4-bit key
H(s3,“1949”)
H(s2,“Jones”)(123)
Can obfuscate any logical circuit of equalities and not-equalities
on individual field values
More Practical Construction
Space inefficiency is due to N random bits for
each row
Goal: generate N bits from a small random seed
so that any subset can be selectively revealed.
Attempt 1: sqrt(N) “blocks” of pseudorandom
sequences
• If a whole block is to be revealed, simply output the
seed, else output the selected bits from that block
• Same worst case, better average case complexity
Better construction: Merkle tree
k0 = hash(kroot||0)
kroot
k1 = hash(kroot||1)
k0
k00
k1
k01
k10
k11
kleft = hash(kparent||0)
kright = hash(kparent||1)
Each key reveals the subtree
rooted at that node, but nothing more
Each leaf is a “block of random bits”
When there are O(1) hidden bits,
the space complexity is O(k log n)
Open questions: worst case? provable?
Summary
Obfuscation is an interesting notion of privacy
• Orthogonal to commonly used definitions
What are the interesting ideal functionalities?
• Lookup, exponential slowdown… what else?
Provably secure constructions for a large class of
access patterns
Practical implementation still a challenge
More details in our CCS 2005 paper
• … and several forthcoming papers