* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Data Representation
Survey
Document related concepts
Transcript
Data Representation January 9–14, 2013 1 / 40 Quick logistical notes In class exercises Bring paper and pencil (or laptop) to each lecture! Goals: • break up lectures, keep you engaged • chance to work through problems in class • ask questions! First homework will be posted before Friday’s lecture! 2 / 40 Outline Internal vs. external representations Representing the natural numbers Binary number system Binary arithmetic Hexadecimal and base-N number systems Fixed-size integer representations Representing negative numbers Big endian vs. little endian 3 / 40 Internal vs. external representations Internal representation How the data is actually represented in the computer hardware External representation How we interpret or conceptualize the internal representation 4 / 40 Internal representations Usually two states, which we interpret as 0 and 1 Volatile representations: • Capacitor (DRAM) • charged or not • Flip-flop circuit (SRAM) • one of two output signals is high Non-volatile representations: • Region of a magnetized surface (hard disks, tape) • positive or negative • Floating gate transistor (flash) • change in voltage • one cell can represent more than two states! • e.g. one 16-level cell ≈ four flip-flops 5 / 40 Interacting with the internal representation Architecture provides an interface • can interact with the internal representation • using the abstraction of the external representation Advantages: • Don’t have to think about internal representation • Architecture can be implemented by different hardware 6 / 40 Organization of the internal representation Usually can’t refer to individual bits • Internal representation organized into groups • Through ISA, can read/write a group by an address Addressable groups in MIPS • byte = 8 bits • word = 4 bytes = 32 bits • (also halfword = 2 bytes = 16 bits) 7 / 40 External representations Conceptually, view data as a sequence of 0s and 1s The same data can be interpreted in different ways: Example: 1111 0110 ö 246 −10 extended ASCII character unsigned integer signed 8-bit integer 8 / 40 Outline Internal vs. external representations Representing the natural numbers Binary number system Binary arithmetic Hexadecimal and base-N number systems Fixed-size integer representations Representing negative numbers Big endian vs. little endian 9 / 40 Decimal number system (base 10) How it works (positional number system): • 10 digits, used in sequence • each position corresponds to a power of 10 • sum of each digit multiplied by position value Example: 2037 ... 105 104 103 102 101 100 . . . 100,000 10,000 1000 100 10 1 (0) (0) 2 0 3 7 2 ·1000 + 0 ·100 + 3 ·10 + 7 ·1 = 2000 + 0 + 30 + 7 = 2037 10 / 40 Binary number system (base 2) Works the same way! • 2 bits, used in sequence (binary digit) • each position corresponds to a power of 2 • sum of each bit multiplied by position value Example: 110101 . . . 27 26 25 24 23 22 21 20 . . . 128 64 32 16 8 4 2 1 (0) (0) 1 1 0 1 0 1 1 · 32 + 1 · 16 + 0 · 8 + 1 · 4 + 0 ·2 + 1 ·1 = 32 + 16 + 0 + 4 + 0 + 1 = 53 11 / 40 Converting from binary to decimal Very easy: • Since binary is just 0s and 1s, no need to multiply • Just add up the position values of the 1 bits Example: 1011 0010 . . . 27 26 25 24 23 22 21 20 . . . 128 64 32 16 8 4 2 1 1 0 1 1 0 0 1 0 128 + 32 + 16 + 2 = 178 12 / 40 Converting from decimal to binary Method 1: Subtracting powers of 2 For each position p from left to right • If 2p ≤ n, subtract and write 1 • Otherwise, write 0 Example: 157 157 − 128 = 29 29 − 16 = 13 13 − 8 = 5 5−4= 1 1−1= 0 1 for 128’s position 0 for 64, 0 for 32, 1 for 16 1 for 8 1 for 4 0 for 2, 1 for 1 1001 1101 13 / 40 Converting from decimal to binary Method 2: Successive division by 2 • Divide by 2 until you reach 0, keeping track of remainders • Write the remainders, from last to first Example: 157 157 78 39 19 9 4 2 1 ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ 2 2 2 2 2 2 2 2 = 78 R 1 = 39 R 0 = 19 R 1 = 9 R 1 = 4 R 1 = 2 R 0 = 1 R 0 = 0 R 1 1001 1101 14 / 40 In class exercises Convert from binary to decimal: • 0010 1010 • 1001 0101 Convert from decimal to binary: • 169, by subtracting powers of 2 • 84, by successive division by 2 15 / 40 Binary addition Just like adding decimal numbers! To add two binary numbers • Pairwise add each bit, starting from the right • 0 + 0 = 0 and 0 + 1 = 1 • On 1 + 1, carry a bit to the left Example: 0110 + 0011 Example: 0011 + 0011 11 + 0110 0011 1001 11 + 0011 0011 0110 16 / 40 Binary multiplication Same algorithm as decimal (only easier) To multiply two binary numbers A and B 1. For each bit b in B: • Multiply b × A, aligning the result with b (since b is 0 or 1, each step yields 0 or a A!) 2. Sum the results Example: 1101 × 1101 1. 1101 ×1101 1101 0 1101 1101 2. 11111 1101 0000 110100 + 1101000 10101001 Often easiest to add results two at a time 17 / 40 Special case: multiplying by a power of 2 Super easy, just like multiplying by powers of 10 in decimal To multiply a binary number by 2p Add p 0s on the right Examples • 100 × 1101 = 110100 • 1010 × 1000 = 1010000 18 / 40 Hexadecimal number system (base 16) Very useful for representing binary data concisely! • 16 digits: 0–9, A, B, C, D, E, F • each position corresponds to a power of 16 • usually prefixed with 0x Each hex digit corresponds to 4 bits 0 1 2 3 0000 0001 0010 0011 4 5 6 7 0100 0101 0110 0111 8 9 A B 1000 1001 1010 1011 C D E F 1100 1101 1110 1111 One byte = 2 hex digits 19 / 40 Converting hexadecimal ⇔ binary Each hex digit corresponds to 4 bits 0 1 2 3 0000 0001 0010 0011 4 5 6 7 0100 0101 0110 0111 8 9 A B 1000 1001 1010 1011 C D E F 1100 1101 1110 1111 Examples • 0xA4F7 = 1010 0100 1111 0111 • 0x0B60 = 0000 1011 0110 0000 We will be doing this a lot this quarter. :) 20 / 40 Converting hexadecimal ⇔ decimal Two strategies: • Convert directly • Convert hexadecimal ⇔ binary ⇔ decimal Example: 0xB6A4 (direct conversion) ... 164 163 162 161 160 . . . 65,536 4,096 256 16 1 (0) B 6 A 4 11 · 4096 + 6 · 256 + 10 · 16 + 4 · 1 = 45056 + 1536 + 160 + 4 = 46,756 21 / 40 Representation in other bases In general, we can represent numbers in any base Some other significant bases: • Base 8 — octal • each octal digit is equivalent to three bits (000 = 08 , 001 = 18 , 010 = 28 , . . . , 111 = 78 ) • useful in old architectures with 12, 24, 36 bit words • support in C and many assembly languages (071 = 718 = 5310 ) • Base 64 (0–9, A–Z, a–z, +, /) • each base-64 digit is equivalent to six bits • used in MIME to transmit binary data in plain ASCII text 22 / 40 In class exercises Add in binary: • 100 1100 + 1110 1111 Multiply in binary: • 1011 × 101 Add in hexadecimal: • 0x28 + 0x4A 0 1 2 3 0000 0001 0010 0011 4 5 6 7 0100 0101 0110 0111 8 9 A B 1000 1001 1010 1011 C D E F 1100 1101 1110 1111 23 / 40 Outline Internal vs. external representations Representing the natural numbers Binary number system Binary arithmetic Hexadecimal and base-N number systems Fixed-size integer representations Representing negative numbers Big endian vs. little endian 24 / 40 Arbitrary vs. fixed precision So far, we have been assuming arbitrary precision • to represent a bigger number, just add more bits/digits! In practice, integers have a fixed size • commonly 32 or 64 bits • based on register size of the architecture This is significant for two reasons: • risk of overflow • representation of negative numbers 25 / 40 Representing negative numbers Must first specify the fixed size of the integer! • With n bits, we can represent 2n different values • Idea: split space so half the values represent negatives Sign and magnitude representation • First bit represents the sign (0 positive, 1 negative) • Rest of bits represent the magnitude, that is |x| Suppose 4-bit integers • Examples: −1 = 1001 −4 = 1100 −7 = 1111 This is exactly the representation you’re used to in decimal! 26 / 40 Problems with sign and magnitude representation This turns out to not be a very good representation . . . why? Issue 1: Multiple zeros • Both 0000 and 1000 represent the same value • This is strange and requires extra effort Issue 2: Complicated arithmetic Simple binary addition doesn’t work 0 010 + 0 011 0 101 3 1 010 + 1 011 1 101 3 0 010 + 1 011 1 001 7 27 / 40 One’s complement representation One’s complement • start with the fixed-size binary representation of |x| • invert every bit Features: • Binary addition is simple (wrap-around carry) • Still two zeros (all 0s and all 1s) Examples • -2 • -3 • -5 1. 0010 1. 0011 1. 0101 2. 1101 2. 1100 2. 1010 28 / 40 One’s complement addition Overflow carries “wrap around” (added on the right) Example: −2 + −3 = −5 11 1101 + 1100 1001 + 1 1010 29 / 40 Two’s complement representation Two’s complement To represent a negative number x: 1. start with the fixed-size binary representation of |x| 2. invert every bit 3. add 1 to the result Suppose 4-bit integers: Examples • -1 • -4 • -7 1. 0001 1. 0100 1. 0111 2. 1110 2. 1011 2. 1000 3. 1111 3. 1100 3. 1001 30 / 40 Features of two’s complement representation Range of expressible values with n bits • max: 2n−1 − 1 • min: −2n−1 0 followed by all 1s 1 followed by all 0s Fixes the issues with sign and magnitude: • Only one zero! (all 0s) • Binary arithmetic “just works” (discard carry out) Examples: 2 + 3 = 5 0010 + 0011 0101 3 −2 + −3 = −5 1110 + 1101 1011 3 2 + −3 = −1 0010 + 1101 1111 3 31 / 40 Sign extension Change the size of an integer without changing its value • if positive (left-most bit 0), pad left with 0s • if negative (left-most bit 1), pad left with 1s Works with both one’s and two’s complement representation Example: Extending from 8-bits to 16-bits • 1001 0110 ⇒ 1111 1111 1001 0110 • 0001 0011 ⇒ 0000 0000 0001 0011 32 / 40 Carry out vs. overflow Carry out: carry after most significant bit ⇒ discard, no error Overflow: result is out of representable range ⇒ error! Carry out 6= overflow! Carry out is a normal part of signed integer addition Will get a carry out when adding: • two negative numbers • a negative and a positive, result is positive Just ignore it! 33 / 40 Two’s complement overflow detection Overflow: result is out of representable range ⇒ error! When adding . . . • two numbers with different signs • overflow can never occur! • two numbers with the same sign • overflow occurs if the sign changes 34 / 40 Trade-offs between representations of negatives In modern architectures, two’s complement is used • Simple arithmetic operations • Only one zero • Hard to read 35 / 40 Unsigned vs. signed integers Can interpret the same n-bit data as either unsigned or signed Unsigned integer • Interpret as a positive number • Range: 0 to 2n Signed integer • Interpret as two’s complement • Range: −2n−1 to 2n−1 − 1 Only different when the leftmost bit is a 1! 36 / 40 Big endian vs. little endian Order of the addressable components in a larger data type • Usually, the order of bytes within a word Big endian Bytes ordered from most significant (left) to least (right) • Example: 256 as a 16-bit halfword: 0x0100 Little endian Bytes ordered from least significant (left) to most (right) • Example: 256 as a 16-bit halfword: 0x0001 Big-endian is what we’ve been assuming so far! 37 / 40 Endian conversion Converting from big endian to little endian 1. Separate the data into addressable components (bytes) 2. Write the components (not the bits!) in reverse order Examples • 0x12345678 • 0xE5AD5CCA 1. 12 34 56 78 1. E5 AD 5C CA 2. 0x78563412 2. 0xCA5CADE5 Same algorithm for converting from little to big! 38 / 40 Which architectures are what endian? Little-endian: • x86, Atmel • MIPS (MARS simulator) Big-endian: • Motorola 6800 and 68k Bi-endian (configurable to be big or little): • ARM, SPARC, PowerPC • MIPS (specification) http://en.wikipedia.org/wiki/Endianness 39 / 40 In class exercises Assume 8-bit integers, addressable in 2-bit chunks For each of the following numbers: 1. write in two’s complement binary form 2. convert to little endian Numbers: • -50 • -100 40 / 40