Download Assembly code

Document related concepts
no text concepts found
Transcript
SRE Basics
SRE Basics
1
In this Section…
 We
briefly cover following topics
o Assembly code
o Virtual machine/Java bytecode
o Windows PE file format
SRE Basics
2
Assembly Code
SRE Basics
3
High Level Languages
First, high level languages…
 Ancient high level languages

o Basic --- little structure
o FORTRAN --- limited structure
o C --- “structured” language

C was designed to deal with complexity
o OO languages take this one step further

Above languages considered primitive
today
SRE Basics
4
High Level Languages
 Object
oriented (OO) languages
o “Object” groups code and data together
o Consider best way to handle complexity
(at least for now…)
 Important
OO ideas include
o Encapsulation, inheritance, polymorphism
SRE Basics
5
High Level Languages
 Program
must deal with code and data
 Data
o Variables, data structures, files, etc.
 Code
o Reverser must study control flow
o Conditionals, switches, loops, etc.
SRE Basics
6
High Level Languages

High level languages --- different users
want different things
o Goes back (at least) to C vs FORTRAN

Today, major tradeoff is between
simplicity and flexibility
o Simplicity --- easy to write short program to do
exactly what you want (e.g., C)
o Flexibility --- language has it all (e.g., Java)
SRE Basics
7
High Level Languages

Some languages compiled into native code
o exe is specific to the hardware
o C, C++, FORTRAN, etc.

Other languages “compiled” into “code”,
which is interpreted by a virtual machine
o Java, C#
o Often possible to make compiled version

For reverser, this distinction is far more
important than OO or not
SRE Basics
8
Intro to Assembly
 At
the lowest level, machine binary
 Assembly code lives between binary
and high level languages
 When reversing native code, we must
deal with assembly code
o Why assembly code?
o Why not “reverse” binary to, say, C?
SRE Basics
9
Intro to Assembly
 Reverser
would like to deal with high
level, but is stuck with low level
 Ideally, want to create mental “link”
from low level to high level
o Easier for code written in C
o Harder for OO code, such as C++
o Why?
SRE Basics
10
Intro to Assembly

Perhaps biggest difference at assembly
level is dealing with data
o High level languages hide lots and lots of details
on data manipulations
o For example, loading and storing

Also, low level instructions are primitive
o Each instruction does not do very much
SRE Basics
11
Intro to Assembly
 Consider
following simple C program
int multiply(int x, int y)
{
int z;
z = x * y;
return z;
}
 Simple,
but far higher level than
assembly code
SRE Basics
12
Intro to Assembly
int multiply(int x, int y)
{
int z;
z = x * y;
return z;
}

In assembly code…
1.
2.
3.
4.
5.
6.
7.
Store state before entering function
Allocate memory for z
Load x and y into registers
Multiply x by y and store result in register
Copy result back to memory for z (optional)
Restore state that was stored in 1.
Return z
SRE Basics
13
Intro to Assembly
Why are things so complicated at low level?
 It’s all about efficiency!
 Reading memory and storing are slow
 No single asm instruction to read memory,
operate on it, and store result

o But this is common in high level languages
SRE Basics
14
Intro to Assembly

Registers --- “local” processor memory
o So don’t have to read and write RAM

Stack --- “scratch paper” (in RAM)
o Holds register values, local variables, function
parameters and return values
o E.g., storage for “z” in multiply example



Heap --- dynamic, variable-sized data
Data section --- e.g., string constants
Control flow --- high level “if” or “while” are much
more complex at low level
SRE Basics
15
Registers
Registers used in most instructions
 Specifics here deal with “IA-32”

o
o
o
o

Intel Architecture, 32-bit
Used in “Wintel” machines
We use IA-32 notation
AT&T notation also exists
Eight 32-bit registers (next slide)
o All 8 start with “E”
o Also several system registers
SRE Basics
16
Registers
EAX, EBX, EDX --- generic, used for int,
Boolean, …, memory operations
 ECX --- generic, used as counter
 ESI/EDI --- generic, source/destination
pointers when copying memory

o SI == source index, DI == destination index

EBP --- generic, stack “base” pointer
o Usually, stack position after return address

ESP --- stack pointer
o Curretn stack frame is between ESP to EBP
SRE Basics
17
Flags

EFLAGS --- special registers
o Status flags updated by various operations to
“record” outcomes
o System flags too, but we don’t care about them
Flags are basic tool for conditionals
 For example, a TEST followed by a jump
instruction

o TEST sets various flags, jump determines
action to take, based on those flags
SRE Basics
18
Instruction Format

Most instructions consist of…
o Opcode --- the “instruction”
o One or two operands --- “parameter(s)”
Operand (parameters) are data
 Operands come in 3 flavors

o Register name --- for example, EAX
o Immediate --- e.g., hard-coded constant
o Memory address --- enclosed in [brackets]
SRE Basics
19
Operand Examples

EAX
o Read from (or write to) EAX register,
depending on opcode

0x30004040
o Immediate --- number is embedded in code
o Usually a constant in high-level code

[0x4000349e]
o This os a memory address
o Could be a global variable in high level code
SRE Basics
20
Basic Instructions

We cover a few common instructions
o First we give general format
o Later, we give a few simple examples
There are lots of assembly instructions
 But, most assembly code uses only a few

o About 14 assembly instructions account for more
than 90% of all code
SRE Basics
21
Opcode Counts
 Typical
opcode counts, “normal” code
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
SRE Basics
22
Opcode Counts
 Opcode
counts, typical virus code
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
SRE Basics
23
Instructions
 We
consider following operations
o Moving data
o Arithmetic
o Comparisons
o Conditional branches
o Function calls
SRE Basics
24
Moving Data
 MOV
is the most popular opcode
 2 operands, destination and source:
o MOV DestOperand, SourceOperand
 Note
the order
o Destination first, source second
SRE Basics
25
Arithmetic

Six integer arithmetic operations
o ADD, SUB, MUL, DIV, IMUL, IDIV

Many variations based on operands
Op1, Op2
; add, store result in Op1
Op1, Op2
; sub Op2 from Op1 --> Op1
Op
; mul Op by EAX ---> EDX:EAX
Op
; div EDX:EAX by Op
quotient ---> EAX, remainder ---> EDX
o IMUL, IDIV --- like MUL and DIV, but signed
o
o
o
o
ADD
SUB
MUL
DIV
SRE Basics
26
Comparisons
 CMP
opcode has 2 operands
o CMP
Operand1, Operand2
 Subtracts
Operand2 from Operand1
 Result “stored” in flag bits
o If 0 then ZF flag is set
o Other flags can be used to tell which is
greater, depending on signed or unsigned
SRE Basics
27
Conditional Branches
 Conditional
branches use “Jcc” family
of instructions (je, jne, jz, jnz, etc.)
 Format is
o Jcc
 If
TargetAddress
Jcc true, goto TargetAddress
o Otherwise, what happens?
SRE Basics
28
Function Calls
 Use
CALL and RET
 RET
can be told to increment ESP
o CALL FunctionAddress
……
o RET ; pops return address
o Need to reset stack pointer
o Why?
SRE Basics
29
Examples
cmp
jnz
ebx,0xf020
10026509
 What
does this do?
 Compares value in EBX with constant
 Jumps to specified address if
operands are not same
o Note: JNE and JNZ are same instruction
SRE Basics
30
Examples
mov
mov
imul
edi,[ecx+0x5b0]
ebx,[ecx+0x5b4]
edi,ebx
What does this do?
 First, add 0x5b0 to ECX register, get value
at that memory and put in EDI
 Next, add 0x5b4 to ECX, get value at that
memory and put in EBX

o Note that ECX points to some data structure

Finally, EDI = EDI * EBX
o Note there are different forms of IMUL
SRE Basics
31
Examples
push
push
push
push
push
call
eax
edi
ebx
esi
dword ptr [esp+0x24]
0x10026eeb
What does this do?
 PUSH four register values
 PUSH something related to stack ptr

o Probably, parameter or local variable
o Would need to look at more code to decide
o Note “dword ptr” is effectively a cast

CALL a function
SRE Basics
32
Examples
mov
shl
mov
cmp
call
eax, dword ptr [ebp - 0x20]
eax, 4
ecx, dword ptr [ebp - 0x24]
dword ptr [eax+ecx+4], 0
0x10026eeb
What does this do?
 Maybe “data structure in an array”
 Last line

o ECX --- gets base pointer
o EAX --- current offset into the array
o Add 4 to get specific member of structure
SRE Basics
33
Examples
 AT&T
syntax
pushl $14
pushl $helloWorld
pushl $1
movl $4, %eax
pushl %eax
int $0x80
addl $16, %esp
pushl $0
movl $1, %eax
pushl %eax
int $0x80
SRE Basics
34
Compilation
Converts high level representation of code
to binary
 Front end --- lexical analysis

o Verify syntax, etc.
Intermediate representation
 Optimization

o Improve structure, eliminate redundancy, …
SRE Basics
35
Compilation

Back end --- generates the actual code
o Instruction selection
o Register allocation
o Instruction scheduling --- pipelining, parallelism

Back end process might make disassembly
hard to read
o Optimization too

Each compiler has its own quirks
o Can you automatically determine compiler?
SRE Basics
36
Virtual Machines & Bytecode
SRE Basics
37
Virtual Machines
 Some
languages instead generate
intermediate bytecode
 Bytecode runs in a virtual machine
o Virtual machine is a program that
(historically) interprets bytecode
o Translates bytecode for the hardware
 Bytecode
SRE Basics
analogous to assembly code
38
Virtual Machines

Advantages?
o Hardware independent

Disadvantages?
o Slow

Today, usually just-in-time compilers
instead of interpreters
o Compile snippets of bytecode into native code
as needed
SRE Basics
39
Reversing Bytecode

Reversing bytecode is easy
o Unless special precautions are taken
o Even then, easier than native code

Bytecode usually contains lots of metadata
o Possible to reconstruct highly accurate high
level language

Bytecode can be obfuscated
o In worst case, reverser must learn bytecode
o But bytecode is easier than native code
SRE Basics
40
Windows PE Files
SRE Basics
41
Windows PE File Format
 Designed
to be standard executable
file format for all versions of OS…
o …on all supported processors
 Only
small changes since PE format
was introduced
o E.g., support for 64-bit Windows
SRE Basics
42
Windows PE Files


Trivia
o
o
o
o
Q: What’s the difference between exe and dll?
A: Not much --- one bit differs in PE files
Q: What is size of smallest possible PE file?
A: 133 bytes
o
o
o
o
Once loaded into memory, it’s a module
File is mapped to module
Address where module begins is HMODULE
PE file may not all be mapped to module
PE file on disk is a file
SRE Basics
43
Windows PE Files
WINNT.H is final word on what PE file
looks like
 Tools to examine PE files

o Dumpbin (Visual Studio)
o Depends
o PE Browse Professional
 In spite of its name, it’s free
o PEDUMP (by author of article)
SRE Basics
44
PE File Sections

Each section is “chunk of code or data that
logically belongs together”
o For example, all import tables in one section

Code is in .text section

Data examples

Can specify section names in C++ source
o Code is code, but many types of data
o Program data (e.g., .rdata for read-only)
o API import/export tables
o Resources, relocation info, etc.
SRE Basics
45
PE File Sections
 When
mapped, module starts on a
page boundary
 Linker can be told to merge sections
o
o
o
o
E.g., to merge .text and .rdata:
/MERGE:.rdata=.text
Some sections commonly merged
Some sections cannot be merged
SRE Basics
46
Relative Virtual Addresses
Exe file specifies in-memory addresses
 PE file specifies preferred load location

o But DLL can actually load just about anywhere

So, PE specifies addresses in a way that is
independent of where it loads
o No hardcoded addresses in PE
o Instead, Relative Virtual Addresses (RVAs)
o RVA is an offset relative to where PE is loaded
SRE Basics
47
Relative Virtual Addresses
To find actual memory location, add RVA to
the actual load address
 For example, suppose

o Exe file is loaded at 0x400000
o And RVA is 0x1000
o Then code (.text) starts at 0x401000

In Windows terminology, actual address is
known as Virtual Address (VA)
SRE Basics
48
Data Directory

There are many data structures within exe
o For efficiency, must be loaded quickly
o E.g., imports, exports, resources, base
relocations, etc.

DataDirectory
o Array of 16 data structures
o #define IMAGE_DIRECTORY_ENTRY_xxx
defines array indexes (0 to 15)
SRE Basics
49
Importing Functions
To use code or data from another DLL,
must import it
 When PE file loads, Windows loader locates
imported functions/data

o Usually automatic, when program first starts
o Imported DLLs may import others
o For example, any program created with Visual
C++ imports KERNEL32.DLL…
o …and KERNEL32.DLL imports from NTDLL.DLL
SRE Basics
50
Importing Functions

Each PE has Import Address Table (IAT)
o IAT contains arrays of function pointers
o One array per imported DLL

Each imported API has spot in IAT
o
o
o
o
The only place where API address stored
So, all calls to API go thru one function ptr
E.g., CALL DWORD PTR [0x00405030]
But, by default it’s a little more complex…
SRE Basics
51
PE File Structure
Next slides describe PE file structure
 Note that all of these data structures
defined in WINNT.H
 Usually, 32-bit and 64-bit versions
 For example,

o IMAGE_NT_HEADERS32
o IMAGE_NT_HEADERS64
o Identical except for widened fields for 64-bit
SRE Basics
52
MS-DOS Header

Every PE begins with small MS-DOS exe
o Prints message saying Windows required

MS-DOS Header
o IMAGE_DOS_HEADER
o 2 “important” values
o e_lfanew --- file offset of PE header
o e_magic --- 0x5A4D, “MZ” in ASCII… Why MZ?
SRE Basics
53
IMAGE_NT_HEADERS Header
 Primary
location for PE specifics
 Location in file given by e_lfanew
 One version for 32-bit exes and
another for 64-bit exes
o Only minor differences between them
o Single bit specifies 32-bit or 64-bit
SRE Basics
54
IMAGE_NT_HEADERS Header

Has 3 fields
typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32

In valid PE, Signature is 0x00004550
o In ASCII, this is “PE00”
SRE Basics
55
IMAGE_NT_HEADERS Header
typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32

IMAGE_FILE_HEADER predates PE
o Struct containing basic info about file
o Most important info is size of “optional data”
that follows (not really optional)
SRE Basics
56
IMAGE_NT_HEADERS Header
typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32

IMAGE_OPTIONAL_HEADER
o DataDirectory array (at end) is “address book”
of important locations in exe
o Each entry contains RVA and size of data
SRE Basics
57
PE Sections
 Recall,
section is “chunk of code or
data that logically belongs together”
 For example
o All data for exe’s import tables are in
one section
SRE Basics
58
Section Table
 Section
table contains array of
IMAGE_SECTION_HEADER structs
 An IMAGE_SECTION_HEADER has
info about associated section
o Location, length, and characteristics
o Number of such headers given by field:
IMAGE_NT_HEADERS.FileHeader.NumberOfSections
SRE Basics
59
Alignment of Sections
 Visual
Studio 6.0
o 4KB sections by default
 Visual
Studio .NET
o 4KB by default, except for small files
uses 0x200-byte alignment
o Also, .NET spec requires 8KB in-memory
alignment (for IA-64 compatibility)
SRE Basics
60
PE Sections
 So
far, overview of PE file format
 Now, look inside important sections…
o …and some data structures within sections
 Then
we finish with look at PEDUMP
o Recall there are other similar utilities
SRE Basics
61
Section Names
.text ---The default code section.
 .data --- The default read/write data
section. Global variables typically go here.
 .rdata --- The default read-only data
section. String literals and C++/COM
vtables are examples of items put into
.rdata.

SRE Basics
62
Section Names


.idata --- The imports table. It has become
common practice (explicitly, or via linker default
behavior) to merge .idata into another section,
typically .rdata. By default, the linker only merges
the .idata section into another section when
creating a release mode exe.
.edata --- The exports table. When creating an
executable that exports APIs or data, the linker
creates an .EXP file which contains an .edata
section that's added into the final executable. Like
the .idata section, the .edata section is often
found merged into the .text or .rdata sections.
SRE Basics
63
Section Names



.rsrc --- The resources. This section is read-only.
However, it should not be renamed and should not
be merged into other sections.
.bss --- Uninitialized data. Rarely found in exes
created with recent linkers. Instead, the
VirtualSize of the exe's .data section is expanded
to make room for uninitialized data.
.crt --- Data added for supporting the C++ runtime
(CRT). A good example is the function pointers
that are used to call the constructors and
destructors of static C++ objects.
SRE Basics
64
Section Names



.tls --- Data for supporting thread local storage variables
declared with __declspec(thread). This includes the initial
value of the data, as well as additional variables needed by
the runtime.
.reloc --- Base relocations in an exe. Base relocations are
generally only needed for DLLs and not EXEs. In release
mode, the linker doesn't emit base relocations for EXE
files. Relocations can be removed when linking with the
/FIXED switch.
.sdata --- "Short" read/write data that can be addressed
relative to the global pointer. Used for IA-64 and other
architectures that use a global pointer register. Regularsized global variables on the IA-64 will go in this section.
SRE Basics
65
Section Names



.srdata --- "Short" read-only data that can be addressed
relative to the global pointer. Used on the IA-64 and other
architectures that use a global pointer register.
.pdata --- The exception table. Contains an array of
IMAGE_RUNTIME_FUNCTION_ENTRY structs, CPU-specific.
Pointed to by IMAGE_DIRECTORY_ENTRY_EXCEPTION slot
in the DataDirectory. Used for architectures with table-based
exception handling, such as the IA-64. The only architecture
that doesn't use table-based exception handling is the x86.
.didat --- Delayload import data. Found in exes built in
nonrelease mode. In release mode, the delayload data is merged
into another section.
SRE Basics
66
Exports Section

Exe may export code or data
o Makes it available to other exes
o Refer to an exported thing as a symbol

At minimum, to export symbol, must
specify its address in defined way
o Keyword ORDINAL tells linker to use numbers,
not names, for symbols
o After all, names just a convenience for coders
SRE Basics
67
IMAGE_EXPORT_DIRECTORY
 Points
to 3 arrays
o And a table of ASCII strings containing
symbol names
 Only
required array is Export Address
Table (EAT)
o Array of function pointers
o Addresses of exported functions
o Export ordinal is an index into this array
SRE Basics
68
IMAGE_EXPORT_DIRECTORY
 Structure
example
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
SRE Basics
69
Example
exports table:
Name:
KERNEL32.dll
Characteristics: 00000000
TimeDateStamp:
3B7DDFD8 -> Fri Aug 17 23:24:08 2001
Version:
0.00
Ordinal base:
00000001
# of functions: 000003A0
# of Names:
000003A0
Entry Pt Ordn Name
00012ADA
1 ActivateActCtx
000082C2
2 AddAtomA
•••remainder of exports omitted
SRE Basics
70
Example

Spse, call GetProcAddress on AddAtomA API
o System locates KERNEL32’s
o
o
o
o
o
IMAGE_EXPORT_DIRECTORY
Gets start address of Export Names Table (ENT)
It finds there are 0x3A0 entries in ENT
Does binary search for AddAtomA
Suppose AddAtomA is 2nd entry…
…loader reads 2nd value from export ordinal table
SRE Basics
71
Example (Continued)
 Call
GetProcAddress on AddAtomA API
o … AddAtomA has export ordinal 2
o Use this as index into EAT (taking into
account base field value)
o Finds AddAtomA has RVA of 0x82C2
o Add 0x82C2 to load address of KERNEL32
to get actual address of AddAtomA
SRE Basics
72
Export Forwarding

Can forward export to another DLL
o That is, must find it at “forward” address

Example
o KERNEL32 HeapAlloc function forwarded to
RtlAllocHeap function exported by NTDLL
o In EXPORTS section of KERNEL32, find
EXPORTS
…
HeapAlloc = NTDLL.RtlAllocHeap
SRE Basics
73
Imports Section
Importing is opposite of exporting
 IMAGE_IMPORTS_DESCRIPTOR

o Points to 2 essentially identical arrays
o Import Address Table & Import Name Table
 IAT and INT
o Contain ordinal, address, forwarding info
o After binding, IAT rewritten, INT retains
original (pre-binding) info
o Binding discussed next…
SRE Basics
74
Imports Section
 Example
o Importing APIs from USER32.DLL
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
SRE Basics
75
Binding
 Binding
means IAT overwritten with
actual addresses
o VAs overwrite RVAs
 Why
do this?
o Increased efficiency
 Loader
SRE Basics
checks whether binding valid
76
Delayload Data
Hybrid between implicit & explicit importing
 Not an OS issue

o A linker issue, at runtime

There is IAT and INT for the DLL
o Identical to regular IAT and INT
o But read by runtime library code instead of OS

Benefit? Calls then go directly to API…
SRE Basics
77
Resources Section
 For
resources such as…
o icons, bitmaps, dialogs, etc.
 Most
complicated section to navigate
 Organized like a file system…
SRE Basics
78
Base Relocations
Executable has many memory addresses
 As mentioned, PE file specifies preferred
memory address to load the module

o ImageBase field in IMAGE_FILE_HEADER

If DLL loaded elsewhere, all addresses will
be incorrect
o Base relocations tell loader all locations that
need to be modified
o Note that this is extra work for the loader

What about EXE, which is not a DLL?
SRE Basics
79
Base Relocation Example

Consider the following line of code
00401020: 8B 0D 34 D4 40 00 mov ecx,dword ptr [0x0040D434]

Note that “8B 0D” specifies opcode
o Also note the address 0x0040D434
Suppose preferred load is at 0x00400000
 If it loads at that address, it runs as-is
 Suppose instead it loads at 0x00500000
 Then code above needs to change to

8B 0D 34 D4 50 00 mov ecx,dword ptr [0x0050D434]
SRE Basics
80
Base Relocation Example
If not loaded at preferred address, then
loader computes delta
 For example on previous slide…

o delta = 0x00500000 - 0x0040000
o So, delta is 0x00100000

Also, there would be base relocation
specifying location 0x00401020
o Loader modifies address located here by delta
SRE Basics
81
Debug Directory
 Contains
debug info
 Not required to run the program
o But useful for development
 Can
be multiple forms of debug info
o Most common is PDB file
SRE Basics
82
.NET Header
.NET executables are PE files
 However, code/data is minimal
 Purpose of PE is simply to get .NET-specific
info into memory

o Metadata, intermediate language (IL)
o MSCOREE.DLL at start of a .NET process
o This dll “takes charge” and uses metadata and
IL from executable
o So PE has stub to get MSCOREE.DLL going
SRE Basics
83
TLS Initialization

Thread Local Storage (TLS)
o .tls section for thread local variables
New threads initialized using .tls data
 Presence of TLS data indicated by nonzero
IMAGE_DIRECTORY_ENTRY_TLS in
DataDirectory

o Points to IMAGE_TLS_DIRECTORY struct
o Contains virtual addresses, VAs (not RVAs)
o The actual struct is in .rdata, not in .tls
SRE Basics
84
Program Exception Data

x86 architecture uses frame-based
exception handling
o A fairly complex way to handle exceptions

IA-64 and others use table-based approach
o Table containing info about every function that
might be affected by exception unwinding
o Table entry includes start and end addresses,
how and where exception to be handled
o When exception occurs, search thru table…
SRE Basics
85
PEDUMP
 Tools
for analyzing PE files
o Dumpbin (Visual Studio)
o Depends
o PE Browse Professional
 In spite of its name, it’s free
o PEDUMP (by author of article)
SRE Basics
86