Download Powerpoint slides - Lecture 3 - User Web Areas at the University of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Cellular differentiation wikipedia , lookup

List of types of proteins wikipedia , lookup

Transcript
Development in hardware – Why?
 Option: array of custom
processing nodes




Step 1: analyze the
application and extract
the component tasks
Step 2: design the
custom processors
Step 3: program the
FPGA
Step 4: assign the
tasks to the processors
and set up the
connection network
← Multi-cellular
organization
← ???
← Growth (cellular
division)
Development in hardware – Why?
 Step 2: as a function
of the tasks, design
one (or more)
custom processors.
×
IN
×+
÷≠
FFT
+
DCT
OUT
Cellular differentiation
 Cells adapt their physical
structure to fit the
“application”
 Can circuits/processors do
the same?


Physically? No
Logically? Yes, but…
 Can they do it easily (dare
we say, automatically)?
Cellular differentiation
 Needed: adaptable cellular architecture
That is, a processor architecture that is






Customizable
Compact
Powerful
Easy to design and modify
Amenable to evolution and learning
Possible solution: MOVE architectures
The MOVE paradigm
 One single instruction : move
 Data displacements trigger
operations
 Architecture based around
data ≠ operation centric
 Regular structure : functional
units + data network
 Scalable and modular
architecture
Example:
Sum of two values
Conventional architecture:
add R1, R2, R3;
MOVE architecture:
move O(Fxxx), I1(Fsum)
move O(Fyyy), I2(Fsum)
move O(Fsum), I(Fzzz)
Cellular differentiation
 Main features:
 Conventional fetch/decode mechanism – compatible with
bio-inspired mechanisms
 No pipeline: computation carried out in specialized
functional units (FU)
 Communication carried out in specialized communication
units (CU)
 Only one instruction that MOVEs data to and from the CUs
and FUs (dataflow architecture)
Cellular differentiation
 Main advantages:
 Can be easily customized by introducing applicationspecific functional and communication units.
 Perfectly fits the requirements of systolic arrays (arbitrarily
complex communication patterns).
 The introduction of custom components does not affect the
assembler language, the code structure, the fetch and
decode units, or the transport bus.
Example – Automatic Synthesis
 Phenotype Layer
Application-specific
(parallel) functions
 Mapping Layer
 Genotype Layer
Developmental
algorithm
Genetic code
Example – Automatic Synthesis
 Phenotype Layer
 Mapping Layer
 Genotype Layer
Totipotent Cell
Example – Automatic Synthesis
Programmable Logic
Totipotent Cell
Example – Automatic Synthesis
Programmable Logic
Cellular Array
Implementation - The BioWall
Development in hardware – Why?
 Option: array of custom
processing nodes




Step 1: analyze the
application and extract
the component tasks
Step 2: design the
custom processors
Step 3: program the
FPGA
Step 4: assign the
tasks to the processors
and set up the
connection network
← Multi-cellular
organization
← ???
← Cell specialization
← Growth (cellular
division)
Cell design and specialization
 Phenotype Layer
Application code
(parallel)
Within a MOVE framework, the specialization
(differentiation) of a cell corresponds to the
selection of the functional and communication units
that can most efficiently implement the desired
application.
FU extraction
 Extracting the optimal FUs from the code is a
complex problem!
FU extraction
 How about having a quick
peek at biology?
 Idea: let us use evolution!!
 In fact, this approach is much
closer to biology than simply
evolving code: in nature, the
hardware (the cell) and the
software (the genome) have
evolved together!
FU extraction
 Idea: let us use evolution!!
FU extraction
 First step: profiling the code (standard
compilation technique)
FU extraction
 Second step:
transform into
tree (standard
compilation
technique)
 Third step:
represent as
1-D genome
 Fourth step: run
the GA (with
some fancy
optimizations)
Fitness evaluation
s = size of the new processor
t = execution time of the program on the new processor
α = execution time of the program on a minimal processor
β = hardware area to implement the minimal processor (which has, by
definition, a fitness of 1)
hwLimit = maximum hardware allowed to implement the new processor
Note:
• Relative fitness function
• When out of allowed hardware
range, logarithmic decrease
• The hardware investment has to
be small enough to be retained
Determining hardware size
 How can the size of the new FU estimated (the β
parameter of the fitness) ?
 The idea:

Determine the size of each basic building block
(+, -AND, …)


What to do with assignments or loops ?
Compute how many of them are used for a new
FU
 The characterization has to be done for every
target platform.
Determining hardware execution time
 Use the same idea used for size :



Compute the time needed for each elementary
function
Take targeted clock period as a basis
When time estimated > clock period, add 1 to the
total time  small jumps in the fitness landscape
Pattern-matching optimization
 How to find reusable FUs ?


The GA behaves a bit like random mutations 
difficult to find reusability this way
Helps the GA a bit : search the whole tree each time a
new HW block is defined to replace similar pieces of
code
Non-optimal block pruning
 “Cleaning” phase made at each step
 Removes HW blocks that are non-
optimal from the fitness point-of-view
 To see if a block is useful, compute
the fitness with and without this block
implemented in HW. If the software
solution has a better fitness, the block
is non-optimal and can be removed.
FU extraction - Interface
DOMAIN
STANDARD
FU extraction - Results
 Example (functions from FACT factorization
algorithm):


Hardware increase (estimated): 10% (fixed)
Speedup (estimated): 2.27 (227%)
 Other results:
 All were obtained in a few seconds