* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download I. Geometric Crossover
Survey
Document related concepts
Transcript
EuroGP 2006
Geometric Crossover for
Biological Sequences
Alberto Moraglio, Riccardo Poli
& Rolv Seehuus
Contents
I.
Geometric Crossover
II.
Geometric Crossover for Sequences
III. Is Biological Recombination Geometric?
I. Geometric Crossover
Geometric Crossover
• Representation-independent generalization
of traditional crossover
• Informally: all offspring are between parents
• Search space: all offspring are on shortest
paths connecting parents
Geometric Crossover & Distance
• Search Space is a Metric Space: d(A,B)
=length of shortest paths between A and B
• Metric space: all offspring C are in the
segment between parents
• C in [A,B]d d(A,C)+d(C,B)=d(A,B)
Example1: Traditional Crossover
• Traditional Crossover is Geometric
Crossover under Hamming Distance
Parent1:
011|101
Parent2:
010|111
Child:
011|111
HD(P1,C)+HD(C,P2)=HD(P1,P2)
1
+
1
=
2
Example2: Blending Crossover
• Blending Crossover for real vectors is
geometric under Euclidean Distance
P2
C
P1
ED(P1,C)+ED(C,P2)=ED(P1,P2)
Many Recombinations are Geometric
• Traditional Crossover for multary strings
• Box and Discrete recombinations for real
vectors
• PMX, Cycle and Order Crossovers for
permutations
• Homologous Crossover for GP trees
• Ask me for more examples over a coffee!
Being geometric crossover is
important because….
• We know how the search space is going to
be searched by geometric crossover for
any representation: convex search
• We know a rule-of-thumb on what type of
landscapes geometric crossover will
perform well: “smooth” landscape
• This is just a beginning of general theory,
in the future we will know more!
II. Geometric Crossover for
Sequences
Sequences & Edit Distance
• Sequence: variable-length string of character
from an alphabet A
• Edit distance: minimum number of edit
operations – insertion, deletion, substitution – to
transform one sequence into the other
• A = {a,c,t,g}, seq1 = agcacaca, seq2 = acacacta
• Seq1=agcacaca acacacta acacacta=Seq2
• ED(Seq1,Seq2)=2 (g deleted, t inserted)
Sequence Alignment (on contents)
• Alignment: put spaces (-) in both sequences
such as they become of the same length
Seq1’= agcacac-a
Seq2’= a-cacacta
• Alignment Score: number of mismatches = 2
• Optimal alignment: minimal score alignment
(Best Inexact Alignment on Contents)
• The score of the optimal alignment of two
sequences equals their edit distance:
ED(Seq1,Seq2)=Score(A)=2
Homologous Crossover
1. Align optimally two parent sequences
2. Generate randomly a crossover mask as long
as the alignment
3. Recombine as traditional crossover
4. Remove dashes from offspring
Mask =
Seq1’=
Seq2’=
SeqC’=
SeqC =
111111000
agcacac-a
a-cacacta
a-cacac-a
acacaca
Theorem: Geometricity of HC
• Homologous Crossover is geometric crossover
under edit distance
Seq1=agcacaca SeqC=acacaca acacacta=Seq2
ED(Seq1,SeqC)+ED(SeqC,Seq2)=ED(Seq1,Seq2)
1
+
1
=
2
More theory on HC in the paper
• Extension to weighted edit distances
Extension to block ins/del edit distances
• Peculiarity of metric segments in edit
distance spaces
• Bounds on offspring size due to parents
size
III. Is Biological
Recombination Geometric?
Recombination at a molecular level
• DNA strands align on the contents, no
positionally
• DNA are flexible, can be stretched or folded to
align better to each others
• DNA strands do not need to be aligned at the
extremities
• Some pair matching are preferred to others
• DNA strands can form loops
• Crossover points happen to be where DNA
strands align better
• Not all details worked out yet!
Homologous Crossover as
a Model of Biological Recombination
Homologous Crossover Biological Recombination
•Alignment on Contents @
minimum distance
•Ins/del move
•Replacement move
•Weighted move
•Block ins/del move
•Transpositions/reversals
•Alignments on contents @
minimum free energy
•Frame-shift (one base gap)
•Base mismatch
•Allows to specify preferred
matching (a-t preferred to a-g)
•Allows to specify preference for
loops, folds, bigger gaps
• Subsequence transp./reversal
Many possible variants of edit distance that fit many
real requirements of biological recombination
“Minimum Free Energy” & Edit Distance
DNA strands align optimally according to edit
distance because:
(i) The alignment of two DNA strands
(macromolecules) obeys chemistry: it is the state
at “minimum free energy”
(ii) The weights of the edit moves can be
interpreted as repulsion forces at a single basis
level
(iii) The best alignment on edit distance is the best
trade-off for which the global effect of repulsion
forces is minimized: the “minimum free energy”
alignment
Is Biological Recombination
Geometric? Yes?!
So what?
Bridging Natural and Artificial Evolution
• Bridging Natural and Artificial Evolution
into a common theoretical framework
• Change in perspective: this allows to study
real biological evolution as a
computational process
• In the paper: we use geometric arguments
to claim that biological evolution does
efficient adaptation!
Summary
• Geometric crossover
– Geometric crossover: offspring between parents
– Many recombinations are geometric
– Some general theory for geometric crossover
• Homologous crossover
– Homologous crossover for sequences: alignment on contents before
recombination
– Homologous crossover is geometric under edit distance
• Biological Recombination
– Homologous crossover models biological recombination at DNA
level, so it is geometric
– Geometric theory applies to biological recombination, bridging
biological & artificial evolution
Questions?