Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SRE Basics SRE Basics 1 In this Section… We briefly cover following topics o Assembly code o Virtual machine/Java bytecode o Windows PE file format SRE Basics 2 Assembly Code SRE Basics 3 High Level Languages First, high level languages… Ancient high level languages o Basic --- little structure o FORTRAN --- limited structure o C --- “structured” language C was designed to deal with complexity o OO languages take this one step further Above languages considered primitive today SRE Basics 4 High Level Languages Object oriented (OO) languages o “Object” groups code and data together o Consider best way to handle complexity (at least for now…) Important OO ideas include o Encapsulation, inheritance, polymorphism SRE Basics 5 High Level Languages Program must deal with code and data Data o Variables, data structures, files, etc. Code o Reverser must study control flow o Conditionals, switches, loops, etc. SRE Basics 6 High Level Languages High level languages --- different users want different things o Goes back (at least) to C vs FORTRAN Today, major tradeoff is between simplicity and flexibility o Simplicity --- easy to write short program to do exactly what you want (e.g., C) o Flexibility --- language has it all (e.g., Java) SRE Basics 7 High Level Languages Some languages compiled into native code o exe is specific to the hardware o C, C++, FORTRAN, etc. Other languages “compiled” into “code”, which is interpreted by a virtual machine o Java, C# o Often possible to make compiled version For reverser, this distinction is far more important than OO or not SRE Basics 8 Intro to Assembly At the lowest level, machine binary Assembly code lives between binary and high level languages When reversing native code, we must deal with assembly code o Why assembly code? o Why not “reverse” binary to, say, C? SRE Basics 9 Intro to Assembly Reverser would like to deal with high level, but is stuck with low level Ideally, want to create mental “link” from low level to high level o Easier for code written in C o Harder for OO code, such as C++ o Why? SRE Basics 10 Intro to Assembly Perhaps biggest difference at assembly level is dealing with data o High level languages hide lots and lots of details on data manipulations o For example, loading and storing Also, low level instructions are primitive o Each instruction does not do very much SRE Basics 11 Intro to Assembly Consider following simple C program int multiply(int x, int y) { int z; z = x * y; return z; } Simple, but far higher level than assembly code SRE Basics 12 Intro to Assembly int multiply(int x, int y) { int z; z = x * y; return z; } In assembly code… 1. 2. 3. 4. 5. 6. 7. Store state before entering function Allocate memory for z Load x and y into registers Multiply x by y and store result in register Copy result back to memory for z (optional) Restore state that was stored in 1. Return z SRE Basics 13 Intro to Assembly Why are things so complicated at low level? It’s all about efficiency! Reading memory and storing are slow No single asm instruction to read memory, operate on it, and store result o But this is common in high level languages SRE Basics 14 Intro to Assembly Registers --- “local” processor memory o So don’t have to read and write RAM Stack --- “scratch paper” (in RAM) o Holds register values, local variables, function parameters and return values o E.g., storage for “z” in multiply example Heap --- dynamic, variable-sized data Data section --- e.g., string constants Control flow --- high level “if” or “while” are much more complex at low level SRE Basics 15 Registers Registers used in most instructions Specifics here deal with “IA-32” o o o o Intel Architecture, 32-bit Used in “Wintel” machines We use IA-32 notation AT&T notation also exists Eight 32-bit registers (next slide) o All 8 start with “E” o Also several system registers SRE Basics 16 Registers EAX, EBX, EDX --- generic, used for int, Boolean, …, memory operations ECX --- generic, used as counter ESI/EDI --- generic, source/destination pointers when copying memory o SI == source index, DI == destination index EBP --- generic, stack “base” pointer o Usually, stack position after return address ESP --- stack pointer o Curretn stack frame is between ESP to EBP SRE Basics 17 Flags EFLAGS --- special registers o Status flags updated by various operations to “record” outcomes o System flags too, but we don’t care about them Flags are basic tool for conditionals For example, a TEST followed by a jump instruction o TEST sets various flags, jump determines action to take, based on those flags SRE Basics 18 Instruction Format Most instructions consist of… o Opcode --- the “instruction” o One or two operands --- “parameter(s)” Operand (parameters) are data Operands come in 3 flavors o Register name --- for example, EAX o Immediate --- e.g., hard-coded constant o Memory address --- enclosed in [brackets] SRE Basics 19 Operand Examples EAX o Read from (or write to) EAX register, depending on opcode 0x30004040 o Immediate --- number is embedded in code o Usually a constant in high-level code [0x4000349e] o This os a memory address o Could be a global variable in high level code SRE Basics 20 Basic Instructions We cover a few common instructions o First we give general format o Later, we give a few simple examples There are lots of assembly instructions But, most assembly code uses only a few o About 14 assembly instructions account for more than 90% of all code SRE Basics 21 Opcode Counts Typical opcode counts, “normal” code QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. SRE Basics 22 Opcode Counts Opcode counts, typical virus code QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. SRE Basics 23 Instructions We consider following operations o Moving data o Arithmetic o Comparisons o Conditional branches o Function calls SRE Basics 24 Moving Data MOV is the most popular opcode 2 operands, destination and source: o MOV DestOperand, SourceOperand Note the order o Destination first, source second SRE Basics 25 Arithmetic Six integer arithmetic operations o ADD, SUB, MUL, DIV, IMUL, IDIV Many variations based on operands Op1, Op2 ; add, store result in Op1 Op1, Op2 ; sub Op2 from Op1 --> Op1 Op ; mul Op by EAX ---> EDX:EAX Op ; div EDX:EAX by Op quotient ---> EAX, remainder ---> EDX o IMUL, IDIV --- like MUL and DIV, but signed o o o o ADD SUB MUL DIV SRE Basics 26 Comparisons CMP opcode has 2 operands o CMP Operand1, Operand2 Subtracts Operand2 from Operand1 Result “stored” in flag bits o If 0 then ZF flag is set o Other flags can be used to tell which is greater, depending on signed or unsigned SRE Basics 27 Conditional Branches Conditional branches use “Jcc” family of instructions (je, jne, jz, jnz, etc.) Format is o Jcc If TargetAddress Jcc true, goto TargetAddress o Otherwise, what happens? SRE Basics 28 Function Calls Use CALL and RET RET can be told to increment ESP o CALL FunctionAddress …… o RET ; pops return address o Need to reset stack pointer o Why? SRE Basics 29 Examples cmp jnz ebx,0xf020 10026509 What does this do? Compares value in EBX with constant Jumps to specified address if operands are not same o Note: JNE and JNZ are same instruction SRE Basics 30 Examples mov mov imul edi,[ecx+0x5b0] ebx,[ecx+0x5b4] edi,ebx What does this do? First, add 0x5b0 to ECX register, get value at that memory and put in EDI Next, add 0x5b4 to ECX, get value at that memory and put in EBX o Note that ECX points to some data structure Finally, EDI = EDI * EBX o Note there are different forms of IMUL SRE Basics 31 Examples push push push push push call eax edi ebx esi dword ptr [esp+0x24] 0x10026eeb What does this do? PUSH four register values PUSH something related to stack ptr o Probably, parameter or local variable o Would need to look at more code to decide o Note “dword ptr” is effectively a cast CALL a function SRE Basics 32 Examples mov shl mov cmp call eax, dword ptr [ebp - 0x20] eax, 4 ecx, dword ptr [ebp - 0x24] dword ptr [eax+ecx+4], 0 0x10026eeb What does this do? Maybe “data structure in an array” Last line o ECX --- gets base pointer o EAX --- current offset into the array o Add 4 to get specific member of structure SRE Basics 33 Examples AT&T syntax pushl $14 pushl $helloWorld pushl $1 movl $4, %eax pushl %eax int $0x80 addl $16, %esp pushl $0 movl $1, %eax pushl %eax int $0x80 SRE Basics 34 Compilation Converts high level representation of code to binary Front end --- lexical analysis o Verify syntax, etc. Intermediate representation Optimization o Improve structure, eliminate redundancy, … SRE Basics 35 Compilation Back end --- generates the actual code o Instruction selection o Register allocation o Instruction scheduling --- pipelining, parallelism Back end process might make disassembly hard to read o Optimization too Each compiler has its own quirks o Can you automatically determine compiler? SRE Basics 36 Virtual Machines & Bytecode SRE Basics 37 Virtual Machines Some languages instead generate intermediate bytecode Bytecode runs in a virtual machine o Virtual machine is a program that (historically) interprets bytecode o Translates bytecode for the hardware Bytecode SRE Basics analogous to assembly code 38 Virtual Machines Advantages? o Hardware independent Disadvantages? o Slow Today, usually just-in-time compilers instead of interpreters o Compile snippets of bytecode into native code as needed SRE Basics 39 Reversing Bytecode Reversing bytecode is easy o Unless special precautions are taken o Even then, easier than native code Bytecode usually contains lots of metadata o Possible to reconstruct highly accurate high level language Bytecode can be obfuscated o In worst case, reverser must learn bytecode o But bytecode is easier than native code SRE Basics 40 Windows PE Files SRE Basics 41 Windows PE File Format Designed to be standard executable file format for all versions of OS… o …on all supported processors Only small changes since PE format was introduced o E.g., support for 64-bit Windows SRE Basics 42 Windows PE Files Trivia o o o o Q: What’s the difference between exe and dll? A: Not much --- one bit differs in PE files Q: What is size of smallest possible PE file? A: 133 bytes o o o o Once loaded into memory, it’s a module File is mapped to module Address where module begins is HMODULE PE file may not all be mapped to module PE file on disk is a file SRE Basics 43 Windows PE Files WINNT.H is final word on what PE file looks like Tools to examine PE files o Dumpbin (Visual Studio) o Depends o PE Browse Professional In spite of its name, it’s free o PEDUMP (by author of article) SRE Basics 44 PE File Sections Each section is “chunk of code or data that logically belongs together” o For example, all import tables in one section Code is in .text section Data examples Can specify section names in C++ source o Code is code, but many types of data o Program data (e.g., .rdata for read-only) o API import/export tables o Resources, relocation info, etc. SRE Basics 45 PE File Sections When mapped, module starts on a page boundary Linker can be told to merge sections o o o o E.g., to merge .text and .rdata: /MERGE:.rdata=.text Some sections commonly merged Some sections cannot be merged SRE Basics 46 Relative Virtual Addresses Exe file specifies in-memory addresses PE file specifies preferred load location o But DLL can actually load just about anywhere So, PE specifies addresses in a way that is independent of where it loads o No hardcoded addresses in PE o Instead, Relative Virtual Addresses (RVAs) o RVA is an offset relative to where PE is loaded SRE Basics 47 Relative Virtual Addresses To find actual memory location, add RVA to the actual load address For example, suppose o Exe file is loaded at 0x400000 o And RVA is 0x1000 o Then code (.text) starts at 0x401000 In Windows terminology, actual address is known as Virtual Address (VA) SRE Basics 48 Data Directory There are many data structures within exe o For efficiency, must be loaded quickly o E.g., imports, exports, resources, base relocations, etc. DataDirectory o Array of 16 data structures o #define IMAGE_DIRECTORY_ENTRY_xxx defines array indexes (0 to 15) SRE Basics 49 Importing Functions To use code or data from another DLL, must import it When PE file loads, Windows loader locates imported functions/data o Usually automatic, when program first starts o Imported DLLs may import others o For example, any program created with Visual C++ imports KERNEL32.DLL… o …and KERNEL32.DLL imports from NTDLL.DLL SRE Basics 50 Importing Functions Each PE has Import Address Table (IAT) o IAT contains arrays of function pointers o One array per imported DLL Each imported API has spot in IAT o o o o The only place where API address stored So, all calls to API go thru one function ptr E.g., CALL DWORD PTR [0x00405030] But, by default it’s a little more complex… SRE Basics 51 PE File Structure Next slides describe PE file structure Note that all of these data structures defined in WINNT.H Usually, 32-bit and 64-bit versions For example, o IMAGE_NT_HEADERS32 o IMAGE_NT_HEADERS64 o Identical except for widened fields for 64-bit SRE Basics 52 MS-DOS Header Every PE begins with small MS-DOS exe o Prints message saying Windows required MS-DOS Header o IMAGE_DOS_HEADER o 2 “important” values o e_lfanew --- file offset of PE header o e_magic --- 0x5A4D, “MZ” in ASCII… Why MZ? SRE Basics 53 IMAGE_NT_HEADERS Header Primary location for PE specifics Location in file given by e_lfanew One version for 32-bit exes and another for 64-bit exes o Only minor differences between them o Single bit specifies 32-bit or 64-bit SRE Basics 54 IMAGE_NT_HEADERS Header Has 3 fields typedef struct _IMAGE_NT_HEADERS { DWORD Signature; IMAGE_FILE_HEADER FileHeader; IMAGE_OPTIONAL_HEADER32 OptionalHeader; } IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32 In valid PE, Signature is 0x00004550 o In ASCII, this is “PE00” SRE Basics 55 IMAGE_NT_HEADERS Header typedef struct _IMAGE_NT_HEADERS { DWORD Signature; IMAGE_FILE_HEADER FileHeader; IMAGE_OPTIONAL_HEADER32 OptionalHeader; } IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32 IMAGE_FILE_HEADER predates PE o Struct containing basic info about file o Most important info is size of “optional data” that follows (not really optional) SRE Basics 56 IMAGE_NT_HEADERS Header typedef struct _IMAGE_NT_HEADERS { DWORD Signature; IMAGE_FILE_HEADER FileHeader; IMAGE_OPTIONAL_HEADER32 OptionalHeader; } IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32 IMAGE_OPTIONAL_HEADER o DataDirectory array (at end) is “address book” of important locations in exe o Each entry contains RVA and size of data SRE Basics 57 PE Sections Recall, section is “chunk of code or data that logically belongs together” For example o All data for exe’s import tables are in one section SRE Basics 58 Section Table Section table contains array of IMAGE_SECTION_HEADER structs An IMAGE_SECTION_HEADER has info about associated section o Location, length, and characteristics o Number of such headers given by field: IMAGE_NT_HEADERS.FileHeader.NumberOfSections SRE Basics 59 Alignment of Sections Visual Studio 6.0 o 4KB sections by default Visual Studio .NET o 4KB by default, except for small files uses 0x200-byte alignment o Also, .NET spec requires 8KB in-memory alignment (for IA-64 compatibility) SRE Basics 60 PE Sections So far, overview of PE file format Now, look inside important sections… o …and some data structures within sections Then we finish with look at PEDUMP o Recall there are other similar utilities SRE Basics 61 Section Names .text ---The default code section. .data --- The default read/write data section. Global variables typically go here. .rdata --- The default read-only data section. String literals and C++/COM vtables are examples of items put into .rdata. SRE Basics 62 Section Names .idata --- The imports table. It has become common practice (explicitly, or via linker default behavior) to merge .idata into another section, typically .rdata. By default, the linker only merges the .idata section into another section when creating a release mode exe. .edata --- The exports table. When creating an executable that exports APIs or data, the linker creates an .EXP file which contains an .edata section that's added into the final executable. Like the .idata section, the .edata section is often found merged into the .text or .rdata sections. SRE Basics 63 Section Names .rsrc --- The resources. This section is read-only. However, it should not be renamed and should not be merged into other sections. .bss --- Uninitialized data. Rarely found in exes created with recent linkers. Instead, the VirtualSize of the exe's .data section is expanded to make room for uninitialized data. .crt --- Data added for supporting the C++ runtime (CRT). A good example is the function pointers that are used to call the constructors and destructors of static C++ objects. SRE Basics 64 Section Names .tls --- Data for supporting thread local storage variables declared with __declspec(thread). This includes the initial value of the data, as well as additional variables needed by the runtime. .reloc --- Base relocations in an exe. Base relocations are generally only needed for DLLs and not EXEs. In release mode, the linker doesn't emit base relocations for EXE files. Relocations can be removed when linking with the /FIXED switch. .sdata --- "Short" read/write data that can be addressed relative to the global pointer. Used for IA-64 and other architectures that use a global pointer register. Regularsized global variables on the IA-64 will go in this section. SRE Basics 65 Section Names .srdata --- "Short" read-only data that can be addressed relative to the global pointer. Used on the IA-64 and other architectures that use a global pointer register. .pdata --- The exception table. Contains an array of IMAGE_RUNTIME_FUNCTION_ENTRY structs, CPU-specific. Pointed to by IMAGE_DIRECTORY_ENTRY_EXCEPTION slot in the DataDirectory. Used for architectures with table-based exception handling, such as the IA-64. The only architecture that doesn't use table-based exception handling is the x86. .didat --- Delayload import data. Found in exes built in nonrelease mode. In release mode, the delayload data is merged into another section. SRE Basics 66 Exports Section Exe may export code or data o Makes it available to other exes o Refer to an exported thing as a symbol At minimum, to export symbol, must specify its address in defined way o Keyword ORDINAL tells linker to use numbers, not names, for symbols o After all, names just a convenience for coders SRE Basics 67 IMAGE_EXPORT_DIRECTORY Points to 3 arrays o And a table of ASCII strings containing symbol names Only required array is Export Address Table (EAT) o Array of function pointers o Addresses of exported functions o Export ordinal is an index into this array SRE Basics 68 IMAGE_EXPORT_DIRECTORY Structure example QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. SRE Basics 69 Example exports table: Name: KERNEL32.dll Characteristics: 00000000 TimeDateStamp: 3B7DDFD8 -> Fri Aug 17 23:24:08 2001 Version: 0.00 Ordinal base: 00000001 # of functions: 000003A0 # of Names: 000003A0 Entry Pt Ordn Name 00012ADA 1 ActivateActCtx 000082C2 2 AddAtomA •••remainder of exports omitted SRE Basics 70 Example Spse, call GetProcAddress on AddAtomA API o System locates KERNEL32’s o o o o o IMAGE_EXPORT_DIRECTORY Gets start address of Export Names Table (ENT) It finds there are 0x3A0 entries in ENT Does binary search for AddAtomA Suppose AddAtomA is 2nd entry… …loader reads 2nd value from export ordinal table SRE Basics 71 Example (Continued) Call GetProcAddress on AddAtomA API o … AddAtomA has export ordinal 2 o Use this as index into EAT (taking into account base field value) o Finds AddAtomA has RVA of 0x82C2 o Add 0x82C2 to load address of KERNEL32 to get actual address of AddAtomA SRE Basics 72 Export Forwarding Can forward export to another DLL o That is, must find it at “forward” address Example o KERNEL32 HeapAlloc function forwarded to RtlAllocHeap function exported by NTDLL o In EXPORTS section of KERNEL32, find EXPORTS … HeapAlloc = NTDLL.RtlAllocHeap SRE Basics 73 Imports Section Importing is opposite of exporting IMAGE_IMPORTS_DESCRIPTOR o Points to 2 essentially identical arrays o Import Address Table & Import Name Table IAT and INT o Contain ordinal, address, forwarding info o After binding, IAT rewritten, INT retains original (pre-binding) info o Binding discussed next… SRE Basics 74 Imports Section Example o Importing APIs from USER32.DLL QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. SRE Basics 75 Binding Binding means IAT overwritten with actual addresses o VAs overwrite RVAs Why do this? o Increased efficiency Loader SRE Basics checks whether binding valid 76 Delayload Data Hybrid between implicit & explicit importing Not an OS issue o A linker issue, at runtime There is IAT and INT for the DLL o Identical to regular IAT and INT o But read by runtime library code instead of OS Benefit? Calls then go directly to API… SRE Basics 77 Resources Section For resources such as… o icons, bitmaps, dialogs, etc. Most complicated section to navigate Organized like a file system… SRE Basics 78 Base Relocations Executable has many memory addresses As mentioned, PE file specifies preferred memory address to load the module o ImageBase field in IMAGE_FILE_HEADER If DLL loaded elsewhere, all addresses will be incorrect o Base relocations tell loader all locations that need to be modified o Note that this is extra work for the loader What about EXE, which is not a DLL? SRE Basics 79 Base Relocation Example Consider the following line of code 00401020: 8B 0D 34 D4 40 00 mov ecx,dword ptr [0x0040D434] Note that “8B 0D” specifies opcode o Also note the address 0x0040D434 Suppose preferred load is at 0x00400000 If it loads at that address, it runs as-is Suppose instead it loads at 0x00500000 Then code above needs to change to 8B 0D 34 D4 50 00 mov ecx,dword ptr [0x0050D434] SRE Basics 80 Base Relocation Example If not loaded at preferred address, then loader computes delta For example on previous slide… o delta = 0x00500000 - 0x0040000 o So, delta is 0x00100000 Also, there would be base relocation specifying location 0x00401020 o Loader modifies address located here by delta SRE Basics 81 Debug Directory Contains debug info Not required to run the program o But useful for development Can be multiple forms of debug info o Most common is PDB file SRE Basics 82 .NET Header .NET executables are PE files However, code/data is minimal Purpose of PE is simply to get .NET-specific info into memory o Metadata, intermediate language (IL) o MSCOREE.DLL at start of a .NET process o This dll “takes charge” and uses metadata and IL from executable o So PE has stub to get MSCOREE.DLL going SRE Basics 83 TLS Initialization Thread Local Storage (TLS) o .tls section for thread local variables New threads initialized using .tls data Presence of TLS data indicated by nonzero IMAGE_DIRECTORY_ENTRY_TLS in DataDirectory o Points to IMAGE_TLS_DIRECTORY struct o Contains virtual addresses, VAs (not RVAs) o The actual struct is in .rdata, not in .tls SRE Basics 84 Program Exception Data x86 architecture uses frame-based exception handling o A fairly complex way to handle exceptions IA-64 and others use table-based approach o Table containing info about every function that might be affected by exception unwinding o Table entry includes start and end addresses, how and where exception to be handled o When exception occurs, search thru table… SRE Basics 85 PEDUMP Tools for analyzing PE files o Dumpbin (Visual Studio) o Depends o PE Browse Professional In spite of its name, it’s free o PEDUMP (by author of article) SRE Basics 86