Download Data Structures

Data Structures Alan, Tam Siu Lung Tam@SiuLung.com 96397999 99967891 Prerequisite • Familiarity with Pascal/C/C++ • Asymptotic Complexity • Techniques learnt – – – – – Recursion Divide and Conquer Exhaustion Greedy [Dynamic Programming exempted] • Algorithms learnt – Bubble / Insertion / Selection / Shell / Merge / Quick / Bucket / Radix Sorting – Linear / Binary / Interpolation Searching What our Programming Language provides? • Built-in Data Types – – – – – – – Character/String (length limit?) Integral (signed/unsigned 8 [?], 16, 32, 64 [?] bit) Floating Point (signed/unsigned 32, 64, 80 [?] bit) Fixed Point [?] Complex [?] Pointer/Reference Function Pointer/Reference What our Programming Language provides? • Aggregate Data Types – Array [base-definable?] • Multiple Values of same type • Access by numeric index – Record/Struct/Class • Multiple Values of different types • Function Aggregation + Inheritance + Polymorphism [?] – Unions [?] What our Programming Language provides? • Built-in Language Constructs – Branching (If, Else) – Loops (For, While, Until) – Function/Procedure Calling • In C++’s view, statements and operators are functions as well • a = b  int &operator=(int &a, const int &b) • a > b  bool operator>(const int &a, const int &b) • *a  int &operator*(int *a) • a[b]  string &operator[](string &a[], int b) – Recursion – Even more for more sophisticated languages! For most of the remaining time • We concentrate at – – – – Pointer Array Record and how they interact • We will use a C++-like notation – – – – array<int> meaning an array of integer int* is acronym of pointer<int> Records are written as: struct<int, int, string> Capital types are “variables” which means it can be replaced by any types Formal Definition: Pointer • Concept: – pointer<Type> p; (Type *p) [^p in Pascal] • Operations: – *p  Type &operator*(Type *p) [p^ in Pascal] • Returns the pointed value • Error if p is null/nil – &y  Type *operator&(Type &p) [@y in Pascal] • Returns the address of a value – p = x  Type *operator=(Type *p, Type *x) • Pointer assignment Formal Definition: Pointer • More Operators – p < q  bool operator<(Type *p, Type *q) • Returns if pointer p is smaller – ++p  Type *operator++(Type *p) [inc(p) in Turbo Pascal] • Point to next element (in an array) – --p  Type *operator--(Type *p) [dec(p) in Turbo Pascal] • Point to previous element (in an array) – p + n  Type *operator+(Type *p, int n) [not in Turbo Pascal] • Point to nth next element (in an array) Programming Syntax: Pointer int main() { int a[10]; int *b = &a[1]; *b = 1; b = new int(2); delete b; b = 0; } var a : array[1..10] of integer; b : ^integer; begin b = @a[2]; b^ = 1; new(b); b^ = 2; dispose(b); b = nil; end. Array • Concept – array<Type, Size : int> – array<Type, Lower : int, Upper : int> • Operations – Type &operator[](Type a[], int index) • Requires 0 <= index < Size • Requires Lower <= index <= Upper • Analysis – a[x] is equivalent to *(a + x) – which is equivalent to (Type *)(@a + x * sizeof(a)) – It is sometimes slower than necessary! Example: Prime Finding • primes[] stores all primes found primes[0] = 2; for each i for each v in primes[] if (v * v > i) then begin primes.add(n); break; end; if (i mod v = 0) then break; Solution #include <iostream> using namespace std; int main() { int primes[100], *last = primes; cout << (*last++ = 2) << endl; for (int i = 3; i < 100; ++i) { int *j = primes; do { if (*j * *j > i) { cout << (*last++ = i) << endl; break; } if (i % *j == 0) break; } while (++j < last); } } var primes: array[1..100] of integer; i : integer; last, j: ^integer; begin last := @primes; last^ := 2; inc(last); for i := 3 to 100 do begin j := @primes; repeat if j^ * j^ > i then begin last^ := i; inc(last); writeln(i); break; end; if (i mod j^ = 0) break; inc(j); until j >= last; end; end. Record • • • • Like Arrays Identified by names instead of index Each name is associated with a type Pair is a special record with 2 elements, Key and Value – Keys are unique (i.e. keys identify records) – Keys are comparable (i.e. sort-able) [sometimes] – Since Value can itself be a record, all records with a unique portion can be represented as a pair) Programming Syntax: Record struct Point { double x, y; }; struct Rect { Point tl, br; int color; }; int main() { Rect rect; rect.color = 255; rect.tl.x = 0.0; } type Point = record x, y : real; end; Rect = record tl, br : Point; color : integer; end; var rect : Rect; begin rect.color := 255; rect.tl.x := 0.0; with rect do begin color := 255; tl.x := 0.0; end; end. Linked List • Combining Pointer and Record • linkedlist<string>: type pNode = ^Node; Node: record value : string; next : pNode; end; var head: pNode; Linked List • Operations – void Add(linkedlist<Type> p, Type &v) • Add an element to the Linked List – Node *Search(linkedlist<Type> p, Type &v) • Returns null/nil if not found – void InsertAfter(Node node, Type &v) • Insert an element after another – void Remove(Node node) • How to implement? • C++: x->y == (*x).y Linked List Implementation Node *list; void Add(int v) { Node *old = list; list = new Node(); list.next = old; list.value = v; } Node *Search(int v) { for (Node *p = list; p; p = p->next) if (p->value == v) return v; return 0; } Node *InsertAfter(Node *n, int v) { Node *old = n.next; n.next = new Node(); n.next.next = old; n.next.value = v; } var list: pNode; procedure Add(v : integer); var old : pNode; old := list; new(list); list.next := old; list.value := v; } function Search(v : integer) : pNode; var n : pNode; begin n := list; while (n <> nil) and (n^.value <> v) do p := p^.next; Search := n; end; { InsertAfter is similar to Add } Array Implementation / 1 2 3 K N V K N V 1 2 3 4 5 1 2 3 4 5 2 3 4 5 / / / / / / Add 2 1 2 K N V 1 2 3 4 5 / 3 4 5 / 2 / / / / Add 3 / 1 4 5 / 2 3 / / / Array Implementation 2 3 Remove 2 2 1 K N V K N V 1 2 3 4 5 1 2 3 4 5 / 1 4 5 / 2 3 / / / Remove 3 1 2 K N V 1 2 3 4 5 / 3 4 5 / 2 / / / / 3 / 4 5 / / 3 / / / Abstraction • Both of the implementations feature the same complexity – – – – O(1) Addition O(n) Searching O(1) Insertion O(1) Removal • Sometimes we don’t care how it gets implemented – We only want a data structure which provides the operations we want. • We define Abstract Data Types (ADTs) to mean a collection of Data Structures providing certain operations – Plane – Polynomial – Graph • We don’t even care how fast the operations in an ADT are, though practically we do Dictionary (Map, Associative Array) • Dictionary is unordered container of kv-pairs • map<Key, Value> – void Insert(map<Key, Value> &c, Key &key, Value &value) – int Size(map<Key, Value> &c) – Value &Search(map<Key, Value> &list, Key &key) – void Delete(map<Key, Value> &list, Key &key) List ADT • List ADT is ordered container of kv-pairs • list<Key, Value> – – – – – – – void Insert(list<Key, Value> &c, int pos, Type &value) Type &Find-ith(list<Key, Value> &c, int pos) void Delete-ith(list<Key, Value> &c, int pos) int Size(list<Key, Value>) Type &Search(list<Key, Value> &c, Key &key) void Delete(list<Key, Value> &c, Key &key) … • A List can be implemented by array (Vector/Table), linked list (LinkedList), etc • A List is also a Dictionary Time Complexity Average Case Add Remove Search Array O(1) O(n) O(n) Sorted Array O(n) O(n) O(lg n) Linked List O(1) O(n) O(n) • We seldom remove anyway • There is no way to make both Add/Search fast • In general, it is difficult if we do not depend on features of the Key Direct Addressing Implementation 0 Ant   5 Boy   99 Car • Use the Vector ADT • The key is the location • Efficient: O(1) for all operations • Infeasible: if the key can range from 1 to 20000000000, if the key is not numeric ... Hash Function • Hash Function: hm(k) • Map all keys “by calculation” into an integer domain, e.g. 0 to m ─ 1 • E.g. CRC32 hashes strings into 32-bit integer (i.e. m = 232) – – – – Alan: 1598313570 Max: 3452409927 Man: 943766770 On: 2246271074 Hash Table Implementation • • • • Use a Table<int, Value> ADT of size m Use hm(Key) as the key All operations can be done like using Table Solved except – Collision: What to do if two different k have same h(k) – How to find a suitable hash function • If good hash functions are used, hash tables provide near O(1) insertion, searching and removal – But it is difficult to get it right – And it is not easy to code – C++: hash_map<Key, Value, hash_func> • Read 2003 Advanced Notes on Hash Table if you are motivated enough Binary Search Tree Implementation • Sorted Array is fast for searching – But it is slow when inserted at front • Idea – Store separate arrays – If value < v, insert to left array – If value >= v, insert to right array • Now we have a Data Structure which is – Worst Case N / 2 + 1 insertion (N in the past) – lg(N) + 1 searching v Binary Search Tree Implementation • Now we have a Data Structure which is – N / 2 + 1 insertion (N in the past) – lg(N) + 1 searching • If we store “N / 2” elements in this DS – N / 4 + 1 insertion – lg(N) searching • If both of left and right arrays use this DS [Recursion] – N / 4 + 2 insertion – lg(N) + 1 searching • Continue this process lg(N) times – lg(N) + 2 insertion – lg(N) + 1 searching – How will it look like? Binary Search Tree Implementation struct Node { Node *left, right; int *value; }; 6 3 1 8 4 7 9 7.5 type pNode = ^Node; Node = record left, right : ^Node; value : int; end; Introduction to Tree • • • • • • Node Root Leaf / Internal Parent / Children [Proper] Ancestors / Descendants Siblings Binary Search Tree Implementation • Operations – Searching • If target < current, go to left • If target > current, go to right – Insertion • Search • Insert it there – Removal • If it is leaf, just remove it. • Otherwise, the smallest one larger than it is leaf. Replace! • Worst Case – If input is sorted, the tree will become … – What can we do? – C++: map<Key, Value, comparator> Recess Have a break! Stack ADT • Something your compiler has implemented for you. void pow(int x, int n) { if (n == 0) return 1; int v = pow(x, n / 2); if (n % 2 == 0) return v * v; return x * v * v; } • pow(3, 5)→pow(3, 2)→pow(3, 1)→pow(3, 0) Stack ADT • But – It mandates what to be put in stack – It couples control flow with data flow • So we will still implement our own stack • Last-in-first-out – When do we need this behavior? • Array? – Fast, but fixed size – C++: stack<Type> Array Implementation of Stack int stack[100]; int top = 0; void push(int v) { stack[top++] = v; } int pop() { return stack[--top]; } var stack : array[1..100] of integer; top : integer; procedure push(v : integer); begin inc(top); stack[top] := v; end; function pop : integer; begin pop := stack[top]; dec(top); end; Queue ADT • First-in-first-out – When do we need this behavior? – Major use is Breadth First Search in Graph • Array? – Fast, but fixed size – Circular? – C++: queue<Type> Array Implementation of Queue int queue[100]; int head = 0, tail = 0; void enqueue(int v) { queue[tail++] = v; } int dequeue() { return queue[head++]; } var queue : array[1..100] of integer; head, tail : integer; procedure enqueue(v : integer); begin inc(tail); stack[tail] := v; end; function dequeue : integer; begin inc(head); pop := stack[head]; end; Priority Queue ADT • PriorityQueue<Priority, Value> – void Push(Priority &p, Value& v) • Add an element – Value &Top() • Returns the element with maximum priority – void Pop() • Remove the element with maximum priority • Again both Array and Linked List can do it suboptimally. A maximum heap can finish Push and Pop in O(lg n) and Top in O(1). • C++: priority_queue<Type, comparator> Heap • In an array with N elements – We can obtain maximum value of an array in O(1) time if every Add() updates this value. – But removal of it destroys all knowledge and requires N – 1 operations to recalculate. • If we have 2 arrays of N / 2 elements – We only need N / 2 time because only the array with maximum extracted is recalculated. 6 2 6 3 4 2 5 3 8 3 1 5 7 8 5 4 Heap 8 2 7 3 4 2 5 3 3 1 5 6 5 4 7 2 3 4 2 5 3 4 2 3 2 6 3 1 5 5 4 3 2 5 4 5 3 1 5 2 3 3 3 1 4 Heap 8 7 6 4 2 3 2 5 5 5 3 3 1 4 Heap 7 6 4 2 3 2 5 5 5 3 3 1 4 Heap 7 6 4 2 3 2 5 5 5 3 3 1 4 Heap 7 5 6 2 5 5 4 3 2 3 3 1 4 Heap 7 5 6 4 2 3 2 5 5 3 3 1 4 Heap 8 7 6 4 2 3 2 5 5 5 3 3 1 4 Heap 4 7 6 4 2 3 2 5 5 5 3 3 1 8 Heap 7 4 6 4 2 3 2 5 5 5 3 3 1 8 Heap 7 5 6 4 2 3 2 5 5 4 3 3 1 8 Heap • • • • • • • • • • Left Complete Binary Tree 1 2 3 4 5 6 7 8 91011121314 [8, 7, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1, 4] [4, 7, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1] 8 [7, 4, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1] 8 [7, 5, 6, 4, 4, 5, 5, 2, 3, 2, 3, 3, 1] 8 [1, 5, 6, 4, 4, 5, 5, 2, 3, 2, 3, 3] 7, 8 [6, 5, 1, 4, 4, 5, 5, 2, 3, 2, 3, 3] 7, 8 [6, 5, 5, 4, 4, 1, 5, 2, 3, 2, 3, 3] 7, 8 [6, 5, 5, 4, 4, 3, 5, 2, 3, 2, 3, 1] 7, 8

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Data Structures