Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Do Humans make Good Observers – and can they Reliably Fuse Information? Dr. Mark Bedworth MV Concepts Ltd. mark.bedworth@mv-concepts.com What we will cover: • • • • • • The decision making process The information fusion context The reliability of the process Where the pitfalls lie How not to get caught out Suggestions for next steps What we will not cover: • • • • • • • Systems design and architectures Counter-piracy specifics Inferencing frameworks Tracking Multi-class problems Extensive mathematics In fact… most of the detail! Our objectives: • Understanding of the context of data fusion for decision making • Quantitative grasp of a few key theories • Appreciation of how to put the theory into practice • Knowledge of where the gaps in theory remain Warning This presentation contains audience participation experiments Decision Making • To make an informed decision: – Obtain data on the relevant factors – Reason within the domain context – Understand the possible outcomes – Have a method of implementation Boyd Cycle • This is captured more formally as a fusion architecture: – Observe: acquire data – Orient: form perspective – Decide: determine course of action – Act: put into practice • Also called OODA loop OODA loop Decide Orient Act Observe Adversarial OODA Loops Decide Decide Orient Orient Act Act Observe Own information Observe Physical world Adversary information Winning the OODA Game • To achieve dominance: – Make better decisions – In a more timely manner – And implement more effectively Dominance History • Action dominance (-A) – Longer range, more destructive, more accurate weapons • Observation dominance (O-) – Longer range, more robust, more accurate sensors • Information dominance (-O-D-) – More timely and relevant information with better support to the decision maker Information Dominance Part One: Orientation “Having acquired relevant data; to undertake reasoning about the data within the domain context to form a perspective of the current situation; so that an informed decision can subsequently be made” A number of approaches • Fusion of hard decisions – Majority rule – Weighted voting – Maximum a posteriori fusion – Behaviour knowledge space • Fusion of soft decisions – Probability fusion Reasoning Frameworks • Boolean – Truth and falsehood • Fuzzy (Zadeh) – Vagueness • Evidential (Dempster-Shafer) – Belief and ignorance • Probabilistic (Bayesian) – Uncertainty Probability theory • 0 ≤ P(H) ≤ 1 • if P(H)=1 then H is certain to occur • P(H) + P(~H) = 1 either H or not-H is certain to occur (negation rule) • P(G,H) = P(G|H) P(H) = P(H|G) P(G) the joint probability is the conditional probability multiplied by the prior (conjunction rule) Bayes’ Theorem Likelihood Prior probability P(X|H) P(H) P(H|X) = P(X) Posterior probability Marginal likelihood Perspective Calculation • Usually the marginal likelihood is awkward to compute – But is not needed since it is independent of the hypothesis – Compute the products of the likelihoods and priors; then normalise over hypotheses Human Fusion Experiment (1) • A threat is present 5% of the time it is looked for • Observers A and B both independently look for the threat • Both report an absence of the threat with posterior probabilities 70% and 80% • What is the fused probability that the threat is absent? Human Fusion Experiment (2) • • • • • • Threat absent ≡ the hypothesis (H) P(~H) = 0.05 P(H) = 0.95 P(H|XA) = 0.70 P(H|XB) = 0.80 P(H|XA,XB) = ? Human Fusion Experiment (3) No threat H =1.00 Report A P(H|XA) =0.70 Prior P(H) =0.95 Report B P(H|XB) =0.80 Conditional Independence • Assume the data to be conditionally independent given the class: P(A, B|H) = P(A|H) P(B|H) • Note that this does not necessarily imply: P(A, B) = P(A) P(B) Sensor 2 measurement Conditionally Independent Sensor 1 measurement Sensor 2 measurement Conditionally independent Sensor 1 measurement Sensor 2 measurement Not conditionally independent Sensor 1 measurement Sensor 2 measurement Not conditionally independent Sensor 1 measurement Fusion: Product Rule (1) • We require: P(H|A, B) • From Bayes’ theorem: P(A, B|H) P(H) P(H|A, B) = P(A, B) Fusion: Product Rule (2) • We assume conditional independence so may write: P(A|H) P(B|H) P(H) P(H|A, B) = P(A, B) Fusion: Product Rule (3) • Applying Bayes’ theorem again: P(H|A) P(A) P(H|B) P(B) P(H) . . P(H|A, B) = P(H) P(H) P(A, B) • And collecting terms: P(H|A) P(H|B) . P(A) P(B) P(H|A, B) = P(H) P(A, B) Fusion: Product Rule (4) • We may drop the marginal likelihoods again and normalise: Posterior probability Posterior probability P(H|A)P(H|B) P(H|A, B)  P(H) Fused posterior probability Prior probability Multisource Fusion Rule • The generalisation of this fusion rule to multiple sources: N P(H|X)   P(H|x ) i =1 i P(H)N-1 • This is commutative Commutativity of Fusion (1) R  P(H|x )  P(H|x ) N P(H|X)   P(H|xi ) i =1 N-1 P(H) S i =1 = i R-1 P(H) . i =1 P(H) i S -1 P(H) Commutativity of Fusion (2) • The probability fusion rule commutes: – It doesn’t matter what the architecture is – It doesn’t matter if it is single stage or multistage Experiment: Results P(H|A) P(H|B) 0.70 × 0.80 = = 0.59 P(H|A, B)  P(H) 0.95 P(~ H|A) P(~ H|B) 0.30 × 0.20 = = 1.20 P(~ H|A, B)  P(~ H) 0.05 • Normalising gives: P(H|A,B) = 0.33 P(~H|A,B) = 0.67 Human Fusion Experiment (3) No threat H =1.00 Fusion A,B P(H|XA,XB) =0.33 Report A P(H|XA) =0.70 Prior P(H) =0.95 Report B P(H|XB) =0.80 Why was that so hard? • Most humans find it difficult to intuitively fuse uncertain information – Not because they are innumerate – But because they cannot comfortably balance the evidence (likelihood) with their predisposition (prior) Prior Sensitivity (1) • If the issue is with the priors – do they matter? • Can we ignore the priors? • Do we get the same final decision if we change the priors? Prior Sensitivity (2) • If P(H|A) = P(H|B) • What value of P(H) makes P(H|A,B) = 0.5? 2 P(H|A) P(H) = 2 2 P(H|A) + (1- P(H|A)) P(H) Prior Sensitivity (3) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 P(H|A)=P(H|B) 0.8 1 Prior Sensitivity (4) • Between 0.2 < P(H|A) < 0.8 the prior has a significant effect • Carefully define the domain over which the prior is evaluated • Put effort into using a reasonable value Sensitivity to Posterior Probability • What about the posterior probabilities delivered to the fusion centre? • Can we endure errors here? • Which types of errors hurt most? Probability Experiment (1) • 10 estimation questions • Write down lower and upper bound • So that you are 90% sure it covers the actual value • All questions relate to the highest point in various countries (in metres) Probability experiment (2) • Winner defined as: – Person with most answers correct – Tie-break decided by smallest sum of ranges (for all 10 questions) • Pick a range big enough • But not too big! The questions:1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Australia Chile Cuba Egypt Ethiopia Finland Hong Kong India Lithuania Poland The answers:1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Australia (2228m) Chile (6893m) Cuba (1974m) Egypt (2629m) Ethiopia (4550m) Finland (1324m) Hong Kong (958m) India (8586m) Lithuania (294m) Poland (2499m) Overconfidence (1) • Large trials show that most people get fewer than 40% correct • Should be 90% correct! • People are often overconfident (even when primed that they are being tested!) Overconfidence (2) Declared probability overconfident wrong underconfident underconfident wrong overconfident Actual probability Confidence Amplification(1) 1 2 sensors 3 sensors 4 sensors 5 sensors 0.9 probability classprobability Fused Fused class 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 class probability InputInput class probability 0.8 1 Confidence Amplification(2) Veto Effect • If any local decision-maker outputs a probability of close to zero for a class then the fused probability is close to zero – even if all the other decision-makers output a high probability – about 40% of the response surface for two sensors is either <0.1 or >0.9 – this rises to 50% for three sensors and nearly 60% for four Moderation of probabilities • If we suspect that the posterior probabilities are overconfident then we should moderate them – By building it into automatic techniques – By allowing for it if this is not possible Gaussian Moderation • For Gaussian classifiers the Bayesian correction is analytically tractable • By integrating over the mean and variance rather than taking the maximum likelihood value Student t-distribution(1) • For Gaussian data this is:   - 0 P( xi | D ) =  dm  ds 2 P( xi | m , s 2 ) P(m , s 2 | D ) • Which is a “Student” t-distribution: N G 2 2   m ˆ ( x ) 2 i +1 P ( xi | mˆ , sˆ , N ) = . 2 - s  sˆ ( N - 1)p G N 1   ( N 1) ˆ  2  -N 2 0.5 0.5 0.45 0.45 0.4 0.4 0.35 0.35 Likelihood data data ofof Likelihood Likelihood of data Student t-distribution(2) 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 -10 -5 0 Measurement value 5 10 0 -10 -5 0 Measurement Measurementvalue value 5 10 Probability of class 1 Probability of class 1 of class 1 Probability Student t-distribution(3) 1 0.9 1 0.9 0.8 0.8 Probability of class 1 0.7 0.6 0.5 0.4 0.7 0.6 0.5 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 -10 -5 0 Measurement value 5 10 0 -10 -5 0 Measurement value 5 10 Approximate Moderation(1) • We can get a similar effect at the fusion centre using the posteriors – – – – Convert back to “likelihoods” by dividing by the prior Add a constant to everything Convert back to “posteriors” by multiplying by the prior Renormalise Approximate Moderation(2) • How much to add depends on the source of the posterior probabilities – Correction factor for each source – Learned from data Other Issues • • • • • Conditional independence not holding Information incest Missing data Communication errors Asynchronous information Information Dominance Part Two: Decision “Having reasoned about the data to form a perspective of the current situation; to make an informed decision which optimises the desirability of the outcome” Deciding what to do “Decision theory is trivial, apart from the details” • Select an action that maximises the expected utility of the outcome Utility functions? • A utility function describes how desirable each possible outcome is – People are sometimes irrational – Desirability cannot be captured by a single valued function – Allais paradox Utility Experiment(1) 1. Guaranteed €1 million 2. 89% chance of €1 million 10% chance of €5 million 1% chance of nothing Utility Experiment(2) 1. 89% chance of nothing 11% chance of €1 million 2. 90% chance of nothing 10% chance of €5 million Utility Experiment(3) • If you prefer 1 to 2 on the first slide You should prefer 1 to 2 on the second slide as well • If not you are acting irrationally… Decision Theory • Assume we are able to construct a utility function (or at least get our superior to define one!) • Enumerate the possible actions – Use our fused probabilities to weight the utility of the possible outcomes – Choose the action for which the expected utility of the outcome is greatest Timing the decision • What about timing? • When should the decision be made? – If we wait then maybe the (fused) probabilities will be more accurate – Or the action will be more effective Explore versus Exploit • By waiting you can explore the situation • By stopping you can exploit the situation • Stopping rule – Sequential analysis – SPRT – Bayesian optimal stopping Experiment with timing • I will show you 20 numbers • They are drawn from the same (uniform) distribution • Select the highest value • But no going back • A bit like ¡Allá tú! Experiment with timing(1) 131 Experiment with timing(2) 16 Experiment with timing(3) 125 Experiment with timing(4) 189 Experiment with timing(5) 105 Experiment with timing(6) 172 Experiment with timing(7) 39 Experiment with timing(8) 94 Experiment with timing(9) 57 Experiment with timing(10) 133 Experiment with timing(11) 52 Experiment with timing(12) 69 Experiment with timing(13) 7 Experiment with timing(14) 242 Experiment with timing(15) 148 Experiment with timing(16) 163 Experiment with timing(17) 23 Experiment with timing(18) 139 Experiment with timing(19) 146 Experiment with timing(20) 211 The answer… • How many people chose 242? • Balance between collecting data on how big the numbers might be (exploration) and actually picking a big number (exploitation) The 1/e Law(1) • Consider a rule of the form: Observe M and remember the best value (V) Observe remaining N-M and pick the first that exceeds V The 1/e Law(2) • It can be shown that the optimum value for M is N/e • And that for this rule the probability of selecting the maximum is at least 1/e • Even for huge values of N Time Pressure (1) • Individuals tend to make the decision too early • Committees tend to leave the decision too late Time Pressure (2) • Lecturers tend to overrun their time slot! Time Pressure (3) • Apologies for skipping over so much of the detail • Some of the other areas that warrant mention: – – – – – Game theory Sensor management Graphical models Cognitive inertia Inattentional blindness Please feel free to contact me mark.bedworth@mv-concepts.com www.mv-concepts.com Or just come and introduce yourself… Thank you! Questions…