* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download x86 ISA
Program optimization wikipedia , lookup
Library (computing) wikipedia , lookup
Name mangling wikipedia , lookup
Stream processing wikipedia , lookup
C Sharp (programming language) wikipedia , lookup
Very long instruction word wikipedia , lookup
History of compiler construction wikipedia , lookup
Protected mode wikipedia , lookup
One-pass compiler wikipedia , lookup
x86 ISA Compiler Baojian Hua bjhua@ustc.edu.cn Front End source code lexical analyzer tokens parser abstract syntax tree semantic analyzer IR Code Generation  Before discussing code generation, we must understand what we are trying to generate     virtual machines bare architecture … This course uses x86   So you’d learn how to program at the x86 level There is an online manual covering every details  relatively old, but enough for understanding Linux, Windows, gcc, … x86  Complex Instruction Set Computer (CISC)  Instructions can operate on memory values   Complex, multi-cycle instructions    e.g., string-copy, call Many ways to do the same thing   e.g., add [eax], ebx e.g., add eax,1 inc eax, sub eax,-1 Instructions are variable-length (1-10 bytes) Registers are not orthogonal Capsule History  1978, 8086   1985, 80386    MMX 2000, Pentium 4   32-bit, protected mode 1989, 80486 1993, Pentium   First x86 microprocessor, 16-bit Deeply pipelined, high frequency 2006, Intel Core 2  Low power, multi-core x86 ISA  Instruction Set Architecture  another programming language (instructions set)      different implementations   encoding decoding assemble, compile to … say Intel vs AMD Basis for OS, compilers, etc.  hardware-software interface x86 ISA  What’s important here?  OS and library   language syntax   Note: assembly program are NOT portable another CFG, read the manual assembler  directives etc. think “compiler”, read the gas manual OS and Library  OS simplifies programming model  e.g., Linux and Windows disable segmentation    the so-called “flat” model in the manual so all segment-related details may be ignored when reading the manual OS provides protection mode  e.g., Linux and Windows run user programs on ring3  so you cannot change the page table! etc. OS and Library  OS provides system calls    hide many crazy details but may be still annoying Libraries   another level of indirection on top of OS system calls In particular, we’d use C library Syntax   Syntax = data + instructions Data  Immediate   4, 3.14, “hello” Register  general-purpose   eax, ebx, … segment  remember? we don’t care Data  Memory  different usage:     globl stack heap but same behavior Data  Memory addressing mode   seg:[base+index*scale+disp] any part can be null   complex! right? e.g., int a[5][10], to read a[3][2] mov eax, 30 mov ebx, 2 mov ecx, [eax+ebx*4+a] Problems with this strategy? Instructions  Manual covers all instructions in details:      Data movement Arithmetic Control transfer … Rather than explain all these bit-by-bit, I’ll give an example next Assembler  Assembler is more than just a compiler:    it costumes assembly syntax it also offers the so-called directives Two main branches:  Intel syntax    assembler on Windows: masm, nasm, … the Intel manual! AT&T syntax  Linux assembler: gas    the good news is that recent version of gas supports Intel syntax! the GCC output! This course uses as with Intel syntax  So reading the Intel manual is relatively easy Example # Sum up an array of integers comments start with “#”, # compiled by GCC: # $ gcc test.s also supports C/C++ style .intel_syntax noprefix directive: telling that we .data directive: assemble prefer Intel thesyntax following data section a: label: the currenttoaddress .int 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 .globl main .text main: directive: store 10 integers start from globl the address directive: symbol“a” directive: assemble the following text section label: anothertoaddress Example, cont’ push ebp mov esp, ebp # convention: eax: the sum, ebx: index xor eax, eax mov ebx, eax L_start: add eax, dword ptr [ebx*4+a] inc ebx comp ebx, 10 jl L_start leave ret Summary  Assembly programming is fun and simple conceptually    but CISC architecture is … and a compound knowledge of OS, architecture and compiler Read the online manual  Essential for code generation