* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download An Information-theoretic Approach to Network Measurement
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					An Information-theoretic Approach to
Network Measurement and Monitoring
Yong Liu, Don Towsley, Tao Ye, Jean Bolot
1
Outline
motivation
background
flow-based network model
full packet trace compression
 marginal/joint
 coarser granularity
 netflow and SNMP
 future work
2
Motivation
 network monitoring: sensing a network
 traffic engineering, anomaly detection, …
 single point v.s. distributed
 different granularities
 full traffic trace: packet headers
 flow level record: timing, volume
 summary statistics:
byte/packet counts
 challenges
 growing scales:
high speed link, large topology
 constrained resources:
processing, storage, transmission
 30G headers/hour at UMass gateway
 solutions
 sampling: temporal/spatial
 compression: marginal/distributed
3
Questions
 how much can we compress monitoring traces?
 how much information is captured by different
monitoring granularity?
 packet trace/NetFlow/SNMP
 how much joint information is there in multiple
monitors?
 joint compression
 trace aggregation
 monitor placement
4
Our Contribution
 flow-based network models
 explore temporal/spatial correlation in network traces
 projection to different granularity
 information theoretic framework
 entropy: bound/guideline on trace compression
 quantitative approach for more general problems
 validation against measurement from operational
network
5
Entropy & Compression
 Shannon entropy of discrete r.v.
 compression of i.i.d. symbols (length M) by coding
 coding:
 expected code length:
 info. theoretic bound on compression ratio:
 Shannon/Huffman coding
 assign short codeword to frequent outcome
 achieve the H(X) bound
6
Entropy & Correlation
 joint entropy
 entropy rate of stochastic process
exploit temporal correlation
Lempel-Ziv Coding: (LZ77, gzip, winzip)
asymptotically achieve the bound for stationary process
 joint entropy rate of correlated processes
7
exploit spatial correlation
Slepian-Wolf Coding: (distributed compression)
encode each process individually, achieve joint entropy rate in limit
Network Trace Compression
 naïve way: treat as byte stream, compress by generic tools
gzip compress UMass traces by a factor of 2
 network traces are highly structured data
multiple fields per packet
• diversity in information richness
• correlation among fields
multiple packets per flow
•
•
packets within a flow share information
temporal correlation
•
•
most fields unchanged within the network
spatial correlation
multiple monitors traversed by a flow
 network models
 explore correlation structure
 quantify information content of network traces
 serves as lower bounds/guidelines for compression algorithms
8
Packet Header Trace
0
16
time stamp (sec.)
Timing
time stamp (sub-sec.)
vers. HLen
ToS
IPID
IP Header
31
TTL
total length
flags
protocol
fragment offset
header checksum
source IP address
destination IP address
source port
destination port
data sequence number
acknowledgment number
TCP Header
Hlen
TCP flags
checksum
9
window size
urgent pointer
Header Field Entropy
0
16
time stamp (sec.)
Timing
time stamp (sub-sec.)
vers. HLen
ToS
IPID
IP Header
31
TTL
total length
flags
protocol
time
fragment offset
header checksum
source IP address
destination IP address
source port
destination port
data sequence number
acknowledgment number
TCP Header
Hlen
TCP flags
checksum
10
window size
urgent pointer
flow id
Single Point Packet Trace
T0
F0
T1
F1
T3
Tm
F0
F0
Tn
Fn
 packet inter-arrival:
# bits per packet:
 temporal correlation introduced by flows
 packets from same flow closely spaced in time
 they share header information
 flow based trace:
T0
F0
 flow record:
F0
K
T3
Tm
T0
flow flow arrival
ID size time
11
F0
packet inter-arrival
F0
Network Models
flow-based model
 flow arrivals follow Poisson with rate
 flows are classified to independent flow classes
according to routing (the set of routers traversed)
 flow i is described by:
• flow inter-arrival time:
• flow ID:
• flow length:
• packet inter-arrival time within the flow:
 packet arrival stochastic process:
12
Entropy in Flow Record
 # bits per flow:
 # bits per second:
 marginal compression ratio
 determined by flow length (pkts.) and
variability in pkt. inter-arrival.
13
Single Point Compression: Results
C1-in
BB1-out
BB2-out
C2-in
router
Trace
H (total)
Model
Ratio
Compression
Algorithm
C1-in
706.3772
0.2002
0.6425
BB1-out
736.1722
0.2139
0.6574
BB2-out
689.9066
0.2186
0.6657
 Compression ratio lower bound calculated by entropy much lower
than real compression algorithm
 Real compression algorithm difference
 Records IPID, packet size, TCP/UDP fields
 Fixed packet buffer for each flow => many flow records for long flows
14
Distributed Network Monitoring
 single flow recorded by multiple
monitors
 spatial correlation:
traces collected at distributed
monitors are correlated
 marginal node view:
#bits/sec to represent flows seen
by one node, bound on single point
compression
 network system view:
#bits/sec to represent flows
cross the network, bound on joint
compression
 joint compression ratio:
quantify gain of joint compression
15
Baseline Joint Entropy Model
 “perfect” network
 fixed routes/constant link delay/no packet loss
 flow classes based on routes
 flows arrive with rate:
 # of monitors traversed:
 #bits per flow record:
 info. rate at node v:
 network view info. rate:
 joint compression ratio:
 dependence on # of monitors travered
16
Joint Compression: Results
C1-in
C2-in
17
BB1-out
BB2-out
router
Set of Traces
Joint Compression Ratio
{C1-in, BB1-out, C2-in, BB2-out}
0.5
{C1-in, BB1-out}
0.8649
{C1-in, BB2-out}
0.8702
{C2-in, BB1-out}
0.7125
{C2-in, BB2-out}
0.6679
Coarser Granularity Models
 NetFlow model
 similar to flow model:
 joint compression result similar to full trace
 SNMP model
 any link SNMP rate process is sum of rate processes of all
flow classes passing through that link
 traffic rates of flow classes are independent Gaussian
 entropy can be calculated by covariance of these processes
 information loss due to summation
 small joint information between monitors
 difficult to recover rates of flow classes from SNMP data
18
Joint Compression Ratio of Different
Granularity
C1-in
C2-in
19
BB1-out
BB2-out
router
Set of Traces
SNMP
NetFlow
Packet Trace
{C1-in, BB1-out}
1.0021
0.8597
0.8649
{C1-in, BB2-out}
0.9997
0.8782
0.8702
Conclusion
 information theoretic bound on marginal
compression ratio -- ~ 20% (time+flow id,
even lower if include other low entropy fields)
 marginal compression ratio high (not very
compressible) in SNMP, lower in NetFlow, and
the lowest in full trace
 joint coding is much more useful/nessassary
in full trace case than in SNMP
 “More entropy for your buck”
20
Future Work
 network impairments
 how many more bits for delay/loss/route change
model netflow with sampling
distributed compression algorithms
lossless v.s. lossy compression
entropy based monitor placement
 maximize information under constraints
21
Thanks!
22