Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Evolving Multimodal
Behavior
PhD Proposal
Jacob Schrum
11/4/09
Introduction
Challenge: Discover behavior automatically
 Simulations,
video games, robotics
Why challenging?
 Noisy
sensors
 Complex domains
 Continuous states/actions
 Multiple agents, teamwork
 Multiple objectives
 Multimodal behavior required (focus)
What is Multimodal Behavior?
Working definition:
 Agent
exhibits distinct kinds of actions
under different circumstances
Examples:
 Offensive
& defensive modes in soccer
 Search for weapons or opponents in
video game
 Animal with foraging & fleeing modes
Very important for teams
 Roles
correspond to modes
 Example domains will involve teamwork
Previous Approaches
Design Approaches
 Hand
Value-Function Based Approaches
 Learn
code in a structured manner
the utility of actions (RL)
Evolutionary Approaches
 Selectively
search based on performance
Design
Subsumption Architecture (Brooks 1986)
 Hierarchical
design
 Lower levels independent of higher levels
 Built incrementally
 Common in robotics
 Hand coded
Value-Function Based (1/2)
MAXQ (Dietterich 1998)
 Hand-designed
hierarchy
 TD learning at multiple levels
 Reduce state space
 Taxi domain
Still just a grid world
 Discrete state & action
Value-Function Based (2/2)
Basis Behaviors (Matarić 1997)
 Low-level
behaviors pre-defined
 Learn high-level control
 Discrete state space
High-level features (conditions)
 Reward
shaping necessary
 Applied to real robots
 Too
much expert knowledge
Evolutionary (1/2)
Layered Evolution (Togelius 2004)
 Evolve
components of
subsumption architecture
 Applied to:
EvoTanks (Thompson and Levine 2008)
 Unreal Tournament 2004 (van Hoorn et al. 2009)
 Must
specify:
Hierarchy
 Training tasks
 Similar
to Layered
Learning (Stone 2000)
Evolutionary (2/2)
Neuro-Evolving Robotic Operatives
(Stanley et al. 2005)
 ML
game
 Train robot army
 Many objectives
Weighted sum: z-scores method
 User changes weights during training
 Dynamic
objective management
Leads to multimodal behavior
Multiple Objectives
Multimodal problems are typically multiobjective
 Modes
associated with objectives
Traditional: weighted sum (Cohon 1978)
 Must
tune the weights
 Only one solution
 Bad for non-convex surfaces
 Need better formalism
Each point corresponds to
one set of specific weights
Cannot be captured
by weighted sum
Greatest Mass Sarsa
(Sprague and Ballard 2003)
Multiple MDPs with shared action space
 Learn via Sarsa(0) update rule:
Best value is sum of component values:
Used in sidewalk
navigation task
 Like weighted sum
Convex Hull Iteration
(Barrett and Narayanan 2008)
Changes MDP formalism:
Vector reward
Find solutions for all possible
weightings where
Maximize:
Results in compact set of solutions
Different trade-offs
Cannot capture non-convex surfaces
Discrete states/actions only
Need a way to capture non-convex surfaces!
Pareto-based Multiobjective Optimization
(Pareto 1890)
Imagine game with two objectives:
Damage Dealt
Health Remaining
dominates
iff
1.
High health but did not deal much damage
Tradeoff between objectives
and
2.
Population of points
not dominated are best:
Pareto Front
Dealt lot of damage,
but lost lots of health
Non-dominated Sorting Genetic Algorithm II
(Deb et al. 2000)
Population P with size N; Evaluate P
Use mutation to get P´ size N; Evaluate P´
Calculate non-dominated fronts of {P P´} size 2N
New population size N from highest fronts of {P  P´}
Constructive Neuroevolution
Genetic Algorithms + Neural Networks
 Build structure incrementally
 Good at generating control policies
 Three basic mutations (no crossover used)
 Other structural
mutations
possible
 More
later
Perturb Weight
Add Connection
Add Node
Evolution of Teamwork
Heterogeneous
 Different
roles
 Cooperation harder
to evolve
 Team-level
multimodal behavior
Homogeneous
 Shared
policy
 Individuals know how
teammates act
 Individuals fill roles as
needed: multimodal
Completed Work
Benefits of Multiobjective Neuroevolution
 Pareto-based
Targeting Unachieved Goals (TUG)
 Speed
up evolution with objective management
Evolving Multiple Output Modes
 Allow
leads to multimodal behavior
networks to have multiple policies/modes
Need a domain to experiment in …
Battle Domain
Evolved monsters (yellow)
Scripted fighter (green)
 Approach nearest monster
 Swing bat repeatedly
Monsters can hurt fighter
Bat can hurt monsters
Multiple objectives
 Deal
damage
 Avoid damage
 Stay alive
Can multimodal teamwork evolve?
Benefits of Multiobjective Neuroevolution
Research Questions:
 NSGA-II
better than z-scores (weighted sum)?
 Homogeneous or heterogeneous teams better?
30 trials for each combination
Three evaluations per individual
 Average
scores to overcome noisy evals
Incremental Evolution
Hard to evolve against scripted strategy
 Could
Incremental evolution against increasing speeds
 0%,
easily fail to evolve interesting behavior
40%, 80%, 90%, 95%, 100%
Increase speed when all
goals are met
End when goals met at 100%
Goals
Average population performance high enough?
Then increase speed
Each objective has a goal:
 At
least 50 damage to bot (1 kill)
 Less than 20 damage per monster on average (2 hits)
 Survive at least 540 time
steps (90% of trial)
AVG population objective
score met goal value?
Goal achieved
Evolved Behaviors
Baiting + Side-Swiping
 Lure
fighter
 Turns allow team to catch up
 Attacks on left side of fighter
Taking Turns
 Hit
and run
 Next counter-clockwise
monster rushes in
 Fighter hit on left side
Multimodal behaviors!
Multiobjective Conclusions
NSGA-II faster than z-scores
 NSGA-II more likely to generate
multimodal behavior
Many runs did not finish/were slow
 Several “successful” runs did not have
multimodal behavior
Targeting Unachieved Goals
Research Question:
 How
to speed up evolution, make more reliable
When objective’s goal is met, stop using it
 Restore objective if scores drop below goal
 Focuses on the most challenging objectives
 Combine NSGA-II with TUG
Tough Objectives
Evolved Behaviors
Alternating Baiting
 Bait
until another monster hits
 Then baiting monster attacks
 Fighter knocked back and forth
Synchronized Formation
 Move
as a group
 Fighter chases one bait
 Other monster rushes in with
side swipe attacks
More multimodal behaviors!
TUG Conclusions
TUG results in huge speed-up
 No
wasted effort on achieved goals
TUG runs finish more reliably
Heterogeneous runs have more multimodal
behavior than homogeneous
Some runs still did not finish
Some “successful” runs still did not have
multimodal behavior
Fight or Flight
Separate Fight and Flight trials
Fight = Battle Domain
Flight:
 Scripted
prey (red) instead of fighter
 Has no bat; has to escape
 Monsters confine and damage
 New objective: Deal damage in Flight
Flight task requires teamwork
Requires multimodal behavior
New-Mode Mutation
Encourage multimodal behavior
 New mode with inputs from preexisting mode
 Initially
very similar
Maximum preference node determines mode
Evolving Multiple Output Modes
Research Question:
 How
to evolve teams that do well in both tasks
Compare 1Mode to ModeMutation
Three evals in Fight and three in Flight
 Same
networks for two different tasks
1Mode Behaviors
Aggressive + Corralling
 Aggressive in Fight task
 Take lots of damage
 Deal lots of damage
 Corralling in Flight task
Run/Rush + Crowding
 Run/Rush in Fight task
 Good timing on attack
 Kill fighter w/o taking too much damage
 Crowding in Flight task
 Get too close to prey
 Knock prey out and it escapes
Networks can’t handle both tasks!
ModeMutation
Behaviors
Alternating Baiting + Corralling
 Alternating
baiting in Fight task
 Corralling in Flight task
Spread out to prevent escape
Individuals rush in to attack
Hit into Crowd + Crowding
 Hitting into Crowd in Fight task
 One attacker knocks fighter into others
 Crowding in Flight task
 Rush prey, ricochet back and forth
 Some times knocks prey free
Networks succeed at both tasks!
Mode Mutation Conclusions
ModeMutation slower than 1Mode
 ModeMutation better at producing
multimodal behaviors
Harder task resulted in more failed runs
 Many unused output modes created
 Slows
down execution
 Bloats output layer
Proposed Work
Extensions
Avoiding Stagnation by Promoting Diversity
2. Extending Evolution of Multiple Output Modes
3. Heterogeneous Teams Using Subpopulations
4. Open-Ended Evolution + TUG
1.
Evaluate in new tasks
Killer App: Unreal Tournament 2004
1. Avoiding Stagnation by Promoting Diversity
Behavioral diversity avoids stagnation
 Add a diversity objective (Mouret et al. 2009)
 Behavior vector:
 Given
input vectors,
concatenate outputs
1.2 -2
Diversity objective:
…
-1 2.2
…
 AVG
distance from
other behavior
vectors in pop.
0.5 1 0.2 1.7 -2
…
1.5 -1 0.6 0.3 2
2. Extending Evolution of Multiple Output Modes
Encourage mode differences
 Random
input sources
Probabilistic arbitration
 Bad
modes less likely to persist
 Like softmax action selection
Restrict New-Mode Mutation
 New
objective: punish unused modes, reward used modes
 Delete similar modes
Based on behavior metric
 Limit
modes: make best use of limited resources
Dynamically increase the limit?
3. Heterogeneous Teams Using Subpopulations
Each team member from different subpopulation
(Yong 2007)
Encourages division of labor across teammates
Different roles leads to multimodal team behavior
4. Open-Ended Evolution + TUG
Keep increasing goals
 Evolution
has something to strive towards
Preserves benefits of TUG
 Does not settle early
 When to increase goals?
 When
 As
all goals are achieved
individual goals are achieved
New Tasks
More tasks require more modes
Investigate single-agent tasks
Investigate complementary objectives
Only teams so far
TUG only helps contradictory?
Are hard when combined with others
Tasks:
1.
Predator
•
•
2.
Opposite of Flight
Partial observability
Sink the Ball
•
•
•
Very different from previous
Needs more distinct modes?
Less mode sharing?
Unreal Tournament 2004
Commercial First-Person Shooter (FPS)
Challenging domain
 Continuous
state and action
 Multiobjective
 Partial information
 Multimodal behaviors required
Programming API: Pogamut
Competitions:
 Botprize
 Deathmatch
Unreal Deathmatch
Packaged bots are hand-coded
Previous winners of botprize hand-coded
Learning attempts
 Simplified
version of game (van Hoorn et al. 2009)
 Limited to certain behaviors (Kadlec 2008)
Multimodal behavior in full game: not done yet
Unreal Teams
Team Deathmatch
 Largely
ignored?
Capture the Flag
 Teams
protect own flag
 Bring enemy flag to base
 GP approach could not beat UT bots (Kadlec 2008)
Domination
 King
of hill
 Teams defend key locations
 RL approach learned group strategy
hand-coded bots (Smith et al. 2007)
of
Review
System for developing multimodal behavior
 Multiobjective
Evolution
 Targeting Unachieved Goals
 New-Mode Mutation
 Behavioral Diversity
 Extending Mode Mutation
 Subpopulations
 Open-Ended Evolution
Final evaluation in Unreal Tournament 2004
Conclusion
Create system:
 Automatically
discovers multimodal behavior
 No high-level hierarchy needed
 No low-level behaviors needed
 Works in continuous, noisy environments
 Discovers team behavior as well
Agents w/array of different useful behaviors
 Lead to better agents/behaviors in
simulations, games and robotics
Questions?
Auxiliary Slides
Design (2)
Behavior Trees (Isla 2005)
 Top-down
approach
 Used first in Halo 2
 Other commercial games since
 “Brute Force Approach to Common Sense”
 Add a behavior for every situation
 Hand coded
Evolutionary (0)
Dangerous Foraging (Stanley et al. 2003)
 Don’t
know if food is safe or poison
 Partial information
 Multimodal:
Eat food
Avoid food
Adaptive Teams of Agents (Bryant
 Roman
legions defend
 Homogeneous teams
 Barbarians plunder
 Multimodal:
Defend town
Chase barbarian
and Miikkulainen 2003)
Other MOEAs
PESA-II (Corne et al. 2001)
strength = 0
strength = 1
In archive
 External
archive and
internal population
 Region-based selection
using squeeze factor
SPEA2 (Zitzler and Thiele 1999)
 External
archive and
internal population
 Fitness based on strength
and density
squeeze factor = 2
squeeze factor = 1
1. Predator
Scripted agent is predator
 Chases
nearest monster
Monsters run to avoid damage
 When combined with Flight,
is partially observable
 Opposite
behavior from Flight
Complementary objectives
 Avoid
damage
 Stay alive
2. Sink the Ball
Monsters push ball around
 No
scripted agent
 Monster agent sensors tuned to ball
Move ball to goal to sink it
Very different from previous tasks
 More
distinct behavioral modes
Complementary objectives
 Min.
distance between ball and goal
 Min. time to sink goal
Heterogeneous z vs. NSGA-II
Homogeneous z vs. NSGA-II
The z-scores method was faster
Heterogeneous NSGA-II vs. TUG
Homogeneous NSGA-II vs. TUG
Heterogeneous 1Mode vs.
ModeMutation
1Mode was faster
Homogeneous 1Mode vs.
ModeMutation
1Mode was faster