* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download PhysLimL24
Survey
Document related concepts
Transcript
Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #24 Adiabatic CMOS cont. Wed., Mar. 13 Administrivia & Overview • Don’t forget to keep up with homework! – We are 8 out of 14 weeks into the course. • You should have earned ~57 points by now. • Course outline: – Part I&II, Background, Fundamental Limits - done – Part III, Future of Semiconductor Technology - done – Part IV, Potential Future Computing Technologies - done – Part V, Classical Reversible Computing • Fundamentals of Adiabatic Processes & logic - last Wed. & Fri. • • • • • (----------------------- Spring Break ------------------------) Adiabatic electronics & CMOS logic families, - Mon. & TODAY Limits of adiabatics: Leakage and clock/power supplies. TODAY RevComp theory I: Emulating Irreversible Machines - Fri. 3/15 RevComp theory II: Bounds on Space-Time Overheads - Mon. 3/18 (plus ~7 more lectures…) – Part VI, Quantum Computing – Part VII, Cosmological Limits, Wrap-Up Adiabatic computing in CMOS Monday: Adiabatic switching, splitlevel retractile & pipelined logic. Today: 2-Level Adiabatic Logic, general adiabatic logic Some Timing Terminology For sequential adiabatic circuits: • Tick: Time for a single ramp transition – adiabatic speed fraction f times the RC gate delay. • Phase: Latency for a data value to propagate forward by 1 pipeline stage. • Cycle: Minimum period for all timing information to return to its initial state. • Diadic: Two retractile levels per gate Monadic: – permits inverting or non-inverting logic. • Dual rail: Two wires per logic value – permits universal logic with monodic gates only 1 level Some Figures of Demerit • Some quantities we may wish to minimize: – Ticks/phase: • proportional to logic propagation latency – Ticks/cycle: • reciprocal to rate of data throughput – Transistor-ticks/cycle: • reciprocal to HW cost-efficiency – Number of required clock/power input signals: • supplying these may be a significant component of system cost – Number of distinct voltage levels required: • may affect reliability/power tradeoff Some Interesting Questions • About pipelined, sequential, fully-adiabatic CMOS logic: – Q: Does it require an intermediate voltage level? • A: No, you can get by with only 2 different levels. – Q: What is the minimum number of externally provided timing signals you can get away with? • A: 4 (12 if split levels are used) – Q: Can the order-N different timing signals needed for long retractile cascades be internally generated within an adiabatic circuit? • A: Yes, but not statically, unless N2 hardware is used – where N is the number of stages per full sequential cycle • We now demonstrate these answers. Some Timing Examples See next slide for some detailed timing diagrams. • N-level retractile cascades: – 2N ticks/phase × 1 phase/cycle = 2N ticks/cycle • 3-phase fully-static diadic SCRL – 8 ticks/phase × 3 phases/cycle = 24 ticks/cycle • 2-phase fully-static monadic SCRL – 5 ticks/phase × 2 phases/cycle = 10 ticks/cycle • 2-phase fully-static diadic SCRL – 6 ticks/phase × 2 phases/cycle = 12 ticks/cycle • 6 tick/cycle dynamic SCRL detailed previously: – 1 tick/phase × 6 phases/cycle = 6 ticks/cycle Some SCRL timing diagrams 2LAL: 2-level Adiabatic Logic P • Dual-rail T-gate symbol: • Basic buffer element: – cross-coupled T-gates • Only 4 different timing signals, 4 ticks per cycle: P A 1 B B : A in P P out 0 – i rises during tick i, falls during tick (i+2) mod 4 • 1 tick/phase × 4 phases/cycle = 4 ticks/cycle! 0 1 2 3 Tick # 0 1 2 3 – Optimizes latency & throughput per gate. B A P 2LAL Cycle of Operation Tick number: 1 2 0 in1 in 3 11 in0 10 out1 01 in=0 01 00 11 out0 out=0 00 Input-Barrier, Clocked-Bias Latching (1) Input conditionally lowers barrier (logic w. series/parallel barriers) (2) Clock applies bias force; conditional bit flip (3) Input removed, raising barrier & locking in state-change (4) Clock bias can retract. 1 2LAL is an example of this. 1 0 0 0 Input pulse 0 1 Pulse ends N 1 Shift Register Structure • 1-tick delay per logic stage: 2 3 4 1 in out 1 2 3 4 • Logic pulse timing & propagation: 1 2 3 4 ... in in 1 2 3 4 ... More complex logic functions • Non-inverting Boolean functions: A B A A B AB AB • For inverting functions, must use quad-rail A=0 A=1 logic encoding: A0 A0 • Zero-transistor A1 A1 “inverters.” – To invert, just swap the rails! Hardware Efficiency issues • Hardware efficiency: How many logic operations per unit hardware per unit time? • Hardware spacetime complexity: How much hardware for how much time per logic op? • We’re interested in minimizing: (# of transistors) × (# of ticks) / (gate cycle) • SCRL inverter, w. return path: – (8 transistors) (6 ticks) = 48 transistor-ticks • Quad-rail 2LAL buffer stage: – (16 transistors) (4 ticks) = 64 transistor-ticks More SCRL vs. 2LAL • SCRL reversible NAND, w. all inverters: – (23 transistors) (6 ticks) = 138 T-ticks • Quad-rail 2LAL AND: – (48 transistors) (4 ticks) = 192 T-ticks • Result of comparison: Although 2LAL minimizes # of rails, and # ticks/cycle, it does not minimize overall spacetime complexity. – The question of whether 6-tick SCRL really minimizes per-op spacetime complexity among pipelined fully-adiabatic CMOS logics is still open. • An opportunity for you to make a contribution! Minimizing Power-Clock Signals • How many external clock signals required? – N-level-deep retractile cascade logic: • 2N waveforms × 1 phase = 2N signals – 6 tick/cycle, 6-phase dynamic SCRL: • 6 waveforms × 6 phases = 36 signals – 24 tick/cycle, 3-phase static SCRL: • 12 waveforms × 3 phases = 36 signals – 4 tick/cycle, 2LAL: • 1 waveform × 4 phases = 4 signals! • It turns out that 12 signals are sufficient to implement any combination of 2-level or 3level logics (including retractile) on-chip! How to Do It • Circular 2LAL shifter; pulse-gated clocks P1 0 P2 P3 P0 in out P0 P1 2 2 P2 2 P3 P0 P1 P2 P3 0 1 2 3 Tick # 0 1 2 3 12-rail system: pros & cons • Pros: – Completely solves adiabatic timing design problem – Enables mixtures of retractile, SCRL, and other logic styles on 1 chip – Enables simple fully-adiabatic SRAM & DRAM • Cons: – Timing signals are dynamic – Known fully-static alternatives use order N2 gates and signals for N-tick-long cycles – N can be large in a chip that includes deep retractile networks – Energy waste in driving the source/drain junction capacitances of all the T-gates even when timing pulse isn’t present (SOI reduces these parasitics) Fully-Adiabatic DRAM cell • 6T, 6 lines/row, 1 line/column (in/out together) • Read cycle: – – – – – Initially: lines neutral, out neutral, R off R for desired row turns on for desired row splits, driving out column R turns off, out is read merges, out is reset • Write cycle: – – – – First, do read cycle. in is set to out W turns on in changed to new value... Fully-Adiabatic SRAM • 10-T, 10 lines/row, 1 line/column • Operation similar to DRAM, except: • Read-out: T2 off; N2 retracts; T3 on; N2 asserts; T2 on, T3 off • Write: T2 off; N2 retracts; N1 retracts, copy of M presented on input; T1 on; in changes; T1 off, N1 N1 N2 asserts; N2 asserts; T2 on T1 in M T2 T3 out