Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Evolving L-Systems to
Capture Protein Structure
Native Conformations
Gabi Escuela1, Gabriela Ochoa2 and Natalio Krasnogor3
1,2 Department
of Computer Science, Universidad Simon Bolivar, Caracas, Venezuela
1gabiescuela@netuno.net.ve, 2gabro@ldc.usb.ve
3 School
of Computer Science and I.T., University of Nottingham
Natalio.Krasnogor@nottingham.ac.uk
Content
Proteins
Protein Structure Prediction (PSP)
The HP model
EA approaches to PSP: current
encoding
L-Systems
Why a grammatical encoding?
Methods and Results
Discussion and Future Work
3D structure of
myoglobin, showing
coloured alpha helices.
Proteins
• Linear chains of ~30-400 units from
20 different amino acids
• Fold into a unique functional
structure: native state or tertiary
structure
Show repeated
substructures:
alpha helices
and beta sheets
1A8M 3-D Structure
Protein Structure Prediction (PSP)
Goal: Determining the 3D
structure of proteins from their
amino acid sequences
Strategy: find an amino acid
chain's state of minimum
energy
Solution will have practical
consequences in medicine,
drug development and
agriculture
The 2D HP Model
Hydrophobic effect is the main
force governing folding
q Є{H, P}+, each letter of q
has to be put in vertex of a
given lattice L (at each point:
turn 90º Left or Right, or
continue ahead)
Scoring function: adds -1 for
each “contact” between two
Hs adjacent in the lattice that
are not consecutive in q
2 Amino acids types:
hydrophobic (H) and
polar or hydrophilic (P)
HPHPPHHPHPPHPHHPPHPH
Square
Lattice
9 H-H bonds
Score = -9
Objective: Find the
organization (embedding) of
q in L of minimum score
(maximum contacts)
EA approaches to PSP: Current
(Direct) Encoding
EAs and other stochastic methods: global optimization
of a suitable energy function
Encoding: Cartesian Coordinates, Distance
Geometries, Internal Coordinates
Absolute: structure encoded as a string of symbols.
For example: In the 2D Square
s = {Up, Down, Left, Right}+
Relative: each move is interpreted in terms of the
previous one
s = {Forward, TurnLeft, TurnRight} +
Protein : HPHPPHHPHPPHPHHPPHPH
L =20
Absolute Encoding
R
D
L
D
RDDLULDLDLUURULURRD
L = 19
First position is fixed
Relative Encoding
R
R
R
F
RFRRLLRLRRFRLLRRFR
First and second
position are fixed
L = 18
L-Systems (Lindenmayer, 1968)
A model of morphogenesis,
based on formal grammars
Rewriting: Define complex
objects by replacing parts of a
simple object using a set of
productions.
Symbols: F, f, +, -, [, ]
Axiom (S)
Production
(replacement) rules
r 1: F
r 2: f
S: F
start
F+f
F
F
1
F+f
2
F+f+F
3
F+f+F+F+f
Why a Grammatical Encoding?
Specifies how to construct the
phenotype
Can achieve greater scalability
through self-similar and hierarchical
structure
Proteins exhibit high degree of
regularity, and repeated motifs
Current encoding may not be
suitable for crossover and building
block transfer between individuals
3D L-System
Protein Structure
Method
Prove of principle: Can a folded protein be
captured (encoded) by an L-system?
How to find that L-system: An EA used to
evolve an L-system that capture a folded
protein (inverse problem)
Input: Folded structure in
Relative Coordinates
RFRRLLRLRRFRLLRRFR
EA
Output: L-system L that
once derived, will produce
the target string
RFRRLLRLRRFRLLRRFR
Axiom = 01F
Rules = {0:RFR1, 1:2L2, 2:R0L}
Proposed Grammatical Encoding
D0L-system (deterministic and context free):
Alphabet: =t nt
t={F,L,R} terminal symbols (relative coord.)
nt={0,1,2,...,m-1} non-terminal symbols
(rewriting rules), m = max. number of rules
Axiom: α *
Rewriting rules: i: wi , where i nt and wi *
Example
axiom R2
rules 0:R03F; 1:R01L;
2:F310; 3:LRL3
Evolutionary Algorithm
Generational with rank based selection
Randomly generated initial population
Prefixed maximum number of rules
Axiom and Rules: randomly generated strings of
prefixed maximum length
Genetic operators
Uniform-like (homologous) recombination (rate = 1.0)
complete production rules are interchanged
Per symbol mutation in both axioms and rules
(deletion (30%), insertion (10%), modification(60%))
Derivation, and Fitness Function
Axiom = 31
genotype
Rules ={0:3LL2; 1:R0RL; 2:RRF; 3:RFR1}
Derivation: from genotype
(axiom and rules) to
phenotype (folded
structure)
Post-processing: nonterminal symbols pruning
Fitness calculation: number
of matches between the
target string and the
solution Min. = 0, Max =
length of the desired
folding.
axiom
31
1st step
RFR1
3
RFR R0RL
1
R0RL
1
R 3LL2 RL
0
RFRR 3LL2 RL R RFR1 LL RRF
RL 3
0
2
RFRRLLRLRRFRLLRRFR
2nd step
3th step
post-processing
phenotype
fitness= 18
Results (1)
Instance
Length
Successes
One Solution
HPHPPHHPHPPHPHHPPHPH
RFRRLLRLRRFRLLRRFR
18
5/50 (4 R)
A = 31
R = {0:3LL2, 1:R0RL,
2:RRF, 3:RFR1}
HHHPPHPHPHPPHPHPHPPH
RRFRFRLFRRFLRLRFRR
18
3/50 (4 R)
A = R2
R = {0:RLR, 1:3F32L,
2:1FR33,3:R102}
HHPPHPPHPPHPPHPPHPPHPPHH
RLLFLFFRRFLLFRRLRFFRRF
22
PPHPPHHPPPPHHPPPPHHPPPPHH
FFRRFFFLLFFFFRRFFFFLLFF
23
0/50 (4 R)
1/50 (5 R)
1/50 (5 R)
A = 1R
R = { 0:4LF3,1:RL243,
2:00F3, 3:RRFL,
4:0R14F}
A= 32
R ={0:20R2, 1:132F,
2:FF012, 3:0FLL}
Results (2)
Evolutionary progression towards the target structure
Discussion
The
proposed EA discovered L-systems
that capture a target folding under the HP
model in 2D lattices
We are not solving the PSP yet, but ..
We are proposing a novel and potentially
useful, generative encoding for
evolutionary approaches to PSP
Future work
Incorporate problem knowledge about secondary
structures
Alpha Helix
Beta Sheet
Explore longer chains and 3D lattices
Beta Turn