Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Data Structures
Alan, Tam Siu Lung
Tam@SiuLung.com
96397999
99967891
Prerequisite
• Familiarity with Pascal/C/C++
• Asymptotic Complexity
• Techniques learnt
–
–
–
–
–
Recursion
Divide and Conquer
Exhaustion
Greedy
[Dynamic Programming exempted]
• Algorithms learnt
– Bubble / Insertion / Selection / Shell / Merge / Quick /
Bucket / Radix Sorting
– Linear / Binary / Interpolation Searching
What our Programming Language provides?
• Built-in Data Types
–
–
–
–
–
–
–
Character/String (length limit?)
Integral (signed/unsigned 8 [?], 16, 32, 64 [?] bit)
Floating Point (signed/unsigned 32, 64, 80 [?] bit)
Fixed Point [?]
Complex [?]
Pointer/Reference
Function Pointer/Reference
What our Programming Language provides?
• Aggregate Data Types
– Array [base-definable?]
• Multiple Values of same type
• Access by numeric index
– Record/Struct/Class
• Multiple Values of different types
• Function Aggregation + Inheritance + Polymorphism
[?]
– Unions [?]
What our Programming Language provides?
• Built-in Language Constructs
– Branching (If, Else)
– Loops (For, While, Until)
– Function/Procedure Calling
• In C++’s view, statements and operators are
functions as well
• a = b int &operator=(int &a, const int &b)
• a > b bool operator>(const int &a, const int &b)
• *a int &operator*(int *a)
• a[b] string &operator[](string &a[], int b)
– Recursion
– Even more for more sophisticated languages!
For most of the remaining time
• We concentrate at
–
–
–
–
Pointer
Array
Record
and how they interact
• We will use a C++-like notation
–
–
–
–
array<int> meaning an array of integer
int* is acronym of pointer<int>
Records are written as: struct<int, int, string>
Capital types are “variables” which means it can
be replaced by any types
Formal Definition: Pointer
• Concept:
– pointer<Type> p; (Type *p) [^p in Pascal]
• Operations:
– *p Type &operator*(Type *p) [p^ in Pascal]
• Returns the pointed value
• Error if p is null/nil
– &y Type *operator&(Type &p) [@y in Pascal]
• Returns the address of a value
– p = x Type *operator=(Type *p, Type *x)
• Pointer assignment
Formal Definition: Pointer
• More Operators
– p < q bool operator<(Type *p, Type *q)
• Returns if pointer p is smaller
– ++p Type *operator++(Type *p) [inc(p) in
Turbo Pascal]
• Point to next element (in an array)
– --p Type *operator--(Type *p) [dec(p) in Turbo
Pascal]
• Point to previous element (in an array)
– p + n Type *operator+(Type *p, int n) [not in
Turbo Pascal]
• Point to nth next element (in an array)
Programming Syntax: Pointer
int main() {
int a[10];
int *b = &a[1];
*b = 1;
b = new int(2);
delete b;
b = 0;
}
var
a : array[1..10] of integer;
b : ^integer;
begin
b = @a[2];
b^ = 1;
new(b);
b^ = 2;
dispose(b);
b = nil;
end.
Array
• Concept
– array<Type, Size : int>
– array<Type, Lower : int, Upper : int>
• Operations
– Type &operator[](Type a[], int index)
• Requires 0 <= index < Size
• Requires Lower <= index <= Upper
• Analysis
– a[x] is equivalent to *(a + x)
– which is equivalent to (Type *)(@a + x * sizeof(a))
– It is sometimes slower than necessary!
Example: Prime Finding
• primes[] stores all primes found
primes[0] = 2;
for each i
for each v in primes[]
if (v * v > i) then begin
primes.add(n);
break;
end;
if (i mod v = 0) then break;
Solution
#include <iostream>
using namespace std;
int main() {
int primes[100], *last = primes;
cout << (*last++ = 2) << endl;
for (int i = 3; i < 100; ++i) {
int *j = primes;
do {
if (*j * *j > i) {
cout << (*last++ = i) << endl;
break;
}
if (i % *j == 0) break;
} while (++j < last);
}
}
var
primes: array[1..100] of integer;
i : integer; last, j: ^integer;
begin
last := @primes;
last^ := 2; inc(last);
for i := 3 to 100 do begin
j := @primes;
repeat
if j^ * j^ > i then begin
last^ := i; inc(last);
writeln(i); break;
end;
if (i mod j^ = 0) break;
inc(j);
until j >= last;
end;
end.
Record
•
•
•
•
Like Arrays
Identified by names instead of index
Each name is associated with a type
Pair is a special record with 2 elements, Key
and Value
– Keys are unique (i.e. keys identify records)
– Keys are comparable (i.e. sort-able) [sometimes]
– Since Value can itself be a record, all records
with a unique portion can be represented as a
pair)
Programming Syntax: Record
struct Point {
double x, y;
};
struct Rect {
Point tl, br;
int color;
};
int main() {
Rect rect;
rect.color = 255;
rect.tl.x = 0.0;
}
type
Point = record x, y : real; end;
Rect = record
tl, br : Point;
color : integer;
end;
var
rect : Rect;
begin
rect.color := 255;
rect.tl.x := 0.0;
with rect do begin
color := 255;
tl.x := 0.0;
end;
end.
Linked List
• Combining Pointer and Record
• linkedlist<string>:
type
pNode = ^Node;
Node: record
value : string;
next : pNode;
end;
var
head: pNode;
Linked List
• Operations
– void Add(linkedlist<Type> p, Type &v)
• Add an element to the Linked List
– Node *Search(linkedlist<Type> p, Type &v)
• Returns null/nil if not found
– void InsertAfter(Node node, Type &v)
• Insert an element after another
– void Remove(Node node)
• How to implement?
• C++: x->y == (*x).y
Linked List Implementation
Node *list;
void Add(int v) {
Node *old = list;
list = new Node();
list.next = old;
list.value = v;
}
Node *Search(int v) {
for (Node *p = list; p; p = p->next)
if (p->value == v) return v;
return 0;
}
Node *InsertAfter(Node *n, int v) {
Node *old = n.next;
n.next = new Node();
n.next.next = old;
n.next.value = v;
}
var
list: pNode;
procedure Add(v : integer);
var old : pNode;
old := list;
new(list);
list.next := old;
list.value := v;
}
function Search(v : integer) : pNode;
var n : pNode;
begin
n := list;
while (n <> nil) and (n^.value <> v)
do p := p^.next;
Search := n;
end;
{ InsertAfter is similar to Add }
Array Implementation
/
1
2
3
K N V
K N V
1
2
3
4
5
1
2
3
4
5
2
3
4
5
/
/
/
/
/
/
Add 2
1
2
K N V
1
2
3
4
5
/
3
4
5
/
2
/
/
/
/
Add 3
/
1
4
5
/
2
3
/
/
/
Array Implementation
2
3
Remove 2
2
1
K N V
K N V
1
2
3
4
5
1
2
3
4
5
/
1
4
5
/
2
3
/
/
/
Remove 3
1
2
K N V
1
2
3
4
5
/
3
4
5
/
2
/
/
/
/
3
/
4
5
/
/
3
/
/
/
Abstraction
• Both of the implementations feature the same complexity
–
–
–
–
O(1) Addition
O(n) Searching
O(1) Insertion
O(1) Removal
• Sometimes we don’t care how it gets implemented
– We only want a data structure which provides the operations we
want.
• We define Abstract Data Types (ADTs) to mean a collection of
Data Structures providing certain operations
– Plane
– Polynomial
– Graph
• We don’t even care how fast the operations in an ADT are,
though practically we do
Dictionary (Map, Associative Array)
• Dictionary is unordered container of kv-pairs
• map<Key, Value>
– void Insert(map<Key, Value> &c, Key &key,
Value &value)
– int Size(map<Key, Value> &c)
– Value &Search(map<Key, Value> &list, Key &key)
– void Delete(map<Key, Value> &list, Key &key)
List ADT
• List ADT is ordered container of kv-pairs
• list<Key, Value>
–
–
–
–
–
–
–
void Insert(list<Key, Value> &c, int pos, Type &value)
Type &Find-ith(list<Key, Value> &c, int pos)
void Delete-ith(list<Key, Value> &c, int pos)
int Size(list<Key, Value>)
Type &Search(list<Key, Value> &c, Key &key)
void Delete(list<Key, Value> &c, Key &key)
…
• A List can be implemented by array (Vector/Table),
linked list (LinkedList), etc
• A List is also a Dictionary
Time Complexity
Average Case
Add
Remove
Search
Array
O(1)
O(n)
O(n)
Sorted Array
O(n)
O(n)
O(lg n)
Linked List
O(1)
O(n)
O(n)
• We seldom remove anyway
• There is no way to make both Add/Search fast
• In general, it is difficult if we do not depend on
features of the Key
Direct Addressing Implementation
0 Ant
5 Boy
99 Car
• Use the Vector ADT
• The key is the location
• Efficient: O(1) for all
operations
• Infeasible: if the key can range
from 1 to 20000000000, if the
key is not numeric ...
Hash Function
• Hash Function: hm(k)
• Map all keys “by calculation” into an integer
domain, e.g. 0 to m ─ 1
• E.g. CRC32 hashes strings into 32-bit
integer (i.e. m = 232)
–
–
–
–
Alan: 1598313570
Max: 3452409927
Man: 943766770
On: 2246271074
Hash Table Implementation
•
•
•
•
Use a Table<int, Value> ADT of size m
Use hm(Key) as the key
All operations can be done like using Table
Solved except
– Collision: What to do if two different k have same h(k)
– How to find a suitable hash function
• If good hash functions are used, hash tables
provide near O(1) insertion, searching and removal
– But it is difficult to get it right
– And it is not easy to code
– C++: hash_map<Key, Value, hash_func>
• Read 2003 Advanced Notes on Hash Table if you
are motivated enough
Binary Search Tree Implementation
• Sorted Array is fast for searching
– But it is slow when inserted at front
• Idea
– Store separate arrays
– If value < v, insert to left array
– If value >= v, insert to right array
• Now we have a Data Structure which is
– Worst Case N / 2 + 1 insertion (N in the past)
– lg(N) + 1 searching
v
Binary Search Tree Implementation
• Now we have a Data Structure which is
– N / 2 + 1 insertion (N in the past)
– lg(N) + 1 searching
• If we store “N / 2” elements in this DS
– N / 4 + 1 insertion
– lg(N) searching
• If both of left and right arrays use this DS [Recursion]
– N / 4 + 2 insertion
– lg(N) + 1 searching
• Continue this process lg(N) times
– lg(N) + 2 insertion
– lg(N) + 1 searching
– How will it look like?
Binary Search Tree Implementation
struct Node {
Node *left, right;
int *value;
};
6
3
1
8
4
7
9
7.5
type
pNode = ^Node;
Node = record
left, right : ^Node;
value : int;
end;
Introduction to Tree
•
•
•
•
•
•
Node
Root
Leaf / Internal
Parent / Children
[Proper] Ancestors / Descendants
Siblings
Binary Search Tree Implementation
• Operations
– Searching
• If target < current, go to left
• If target > current, go to right
– Insertion
• Search
• Insert it there
– Removal
• If it is leaf, just remove it.
• Otherwise, the smallest one larger than it is leaf. Replace!
• Worst Case
– If input is sorted, the tree will become …
– What can we do?
– C++: map<Key, Value, comparator>
Recess
Have a break!
Stack ADT
• Something your compiler has implemented
for you.
void pow(int x, int n) {
if (n == 0) return 1;
int v = pow(x, n / 2);
if (n % 2 == 0) return v * v;
return x * v * v;
}
• pow(3, 5)→pow(3, 2)→pow(3, 1)→pow(3, 0)
Stack ADT
• But
– It mandates what to be put in stack
– It couples control flow with data flow
• So we will still implement our own stack
• Last-in-first-out
– When do we need this behavior?
• Array?
– Fast, but fixed size
– C++: stack<Type>
Array Implementation of Stack
int stack[100];
int top = 0;
void push(int v) {
stack[top++] = v;
}
int pop() {
return stack[--top];
}
var
stack : array[1..100] of integer;
top : integer;
procedure push(v : integer);
begin
inc(top);
stack[top] := v;
end;
function pop : integer;
begin
pop := stack[top];
dec(top);
end;
Queue ADT
• First-in-first-out
– When do we need this behavior?
– Major use is Breadth First Search in Graph
• Array?
– Fast, but fixed size
– Circular?
– C++: queue<Type>
Array Implementation of Queue
int queue[100];
int head = 0, tail = 0;
void enqueue(int v) {
queue[tail++] = v;
}
int dequeue() {
return queue[head++];
}
var
queue : array[1..100] of integer;
head, tail : integer;
procedure enqueue(v : integer);
begin
inc(tail);
stack[tail] := v;
end;
function dequeue : integer;
begin
inc(head);
pop := stack[head];
end;
Priority Queue ADT
• PriorityQueue<Priority, Value>
– void Push(Priority &p, Value& v)
• Add an element
– Value &Top()
• Returns the element with maximum priority
– void Pop()
• Remove the element with maximum priority
• Again both Array and Linked List can do it
suboptimally. A maximum heap can finish
Push and Pop in O(lg n) and Top in O(1).
• C++: priority_queue<Type, comparator>
Heap
• In an array with N elements
– We can obtain maximum value of an array in O(1) time
if every Add() updates this value.
– But removal of it destroys all knowledge and requires
N – 1 operations to recalculate.
• If we have 2 arrays of N / 2 elements
– We only need N / 2 time because only the array with
maximum extracted is recalculated.
6
2 6 3 4 2 5 3
8
3 1 5 7 8 5 4
Heap
8
2 7 3 4 2 5 3 3 1 5 6 5 4
7
2 3 4 2 5 3
4
2 3
2
6
3 1 5 5 4
3
2
5
4
5
3 1
5
2 3
3
3
1
4
Heap
8
7
6
4
2
3
2
5
5
5
3
3
1
4
Heap
7
6
4
2
3
2
5
5
5
3
3
1
4
Heap
7
6
4
2
3
2
5
5
5
3
3
1
4
Heap
7
5
6
2
5
5
4
3
2
3
3
1
4
Heap
7
5
6
4
2
3
2
5
5
3
3
1
4
Heap
8
7
6
4
2
3
2
5
5
5
3
3
1
4
Heap
4
7
6
4
2
3
2
5
5
5
3
3
1
8
Heap
7
4
6
4
2
3
2
5
5
5
3
3
1
8
Heap
7
5
6
4
2
3
2
5
5
4
3
3
1
8
Heap
•
•
•
•
•
•
•
•
•
•
Left Complete Binary Tree
1 2 3 4 5 6 7 8 91011121314
[8, 7, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1, 4]
[4, 7, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1] 8
[7, 4, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1] 8
[7, 5, 6, 4, 4, 5, 5, 2, 3, 2, 3, 3, 1] 8
[1, 5, 6, 4, 4, 5, 5, 2, 3, 2, 3, 3] 7, 8
[6, 5, 1, 4, 4, 5, 5, 2, 3, 2, 3, 3] 7, 8
[6, 5, 5, 4, 4, 1, 5, 2, 3, 2, 3, 3] 7, 8
[6, 5, 5, 4, 4, 3, 5, 2, 3, 2, 3, 1] 7, 8