Download Vectorization - Computer Science

CS4619: Artificial Intelligence 2 Background Review Derek Bridge School of Computer Science and Information Technology University College Cork Initialization In [2]: %load_ext autoreload %autoreload 2 %matplotlib inline In [3]: import pandas as pd import numpy as np import matplotlib.pyplot as plt In [ ]: import numpy.linalg as npla Matrices A matrix is a rectangular array, in our case of real numbers. A A = [ 21 43 02 ] Dimension: A matrix with m rows and n columns is an m × n matrix What are m and n for A ? We refer to an element of a matrix either using subscripts or indexes Ai,j or AA[i,j] is the element in the ith row and jth column In general, we use bold capital letters, e.g. , for matrices, e.g. We will index from 1 However, we will sometimes use position 0 for 'technical' purposes And we must be aware that Python numpy arrays and matrices are 0indexed So what are , , and ? A2,1 A1,2 A0,0 A3,2 Vectors A vector is a matrix that has only one column, i.e. a m × 1 matrix m m A vector with rows is called a dimensional vector In general, we use bold lowercase letters for vectors, e.g.  2  x =  4  3 Sometimes this is called a column vector Then, by contrast, a row vector is a matrix that has only one row, i.e. a [2,4,3] 1 × n matrix, e.g. Unless stated otherwise, a vector should be assumed to be a column vector. We can refer to an element using a single subscript, again most of the time indexed from 1 So what is ? x1 Vectors and Matrices in Python Of the many ways of representing vectors and matrices in Python, we will use two: From the pandas library, for vectors we will use Series, which are a kind of onedimensional array, and for matrices we will use DataFrames, which are tabular data structures of rows and columns. As well as supporting fairly traditional indexing, the columns in a DataFrame can have names, which can also be used to extract data from the DataFrame. From the numpy library, we will use numpy arrays, which can be onedimensional, twodimensional, or have more dimensions. The machine learning library that we will use, scikitlearn, expects its data to arrive as numpy arrays. For the rest of this lecture, we'll exemplify numpy arrays. Using numpy arrays In [4]: # # # x Vectors We will use a numpy 1d array, which we can create from a list But, done this way, there is no way for us to distinguish between column- and row-vect = np.array([2, 4, 3]) # Matrices # We will use a numpy 2d array, which we can create from a list of lists A = np.array([[2, 4, 0], [1, 3, 2]]) Matrix Addition and Subtraction A B A = [ 21 43 02 ] B = [ 12 03 52 ] A + B = [ 33 46 54 ] Two matrices and can be added or subtracted if they have the same dimensions. Their sum is obtained by adding or subtracting corresponding elements, e.g. Scalar Multiplication Scalar Multiplication Matrices can be multiplied by real numbers (in this context, often called 'scalars'). The scalar product is obtained by multiplying each element by the scalar, e.g. A = [ 21 43 02 ] 2AA = [ 42 86 04 ] AA/2 = [ 0.51 1.52 01 ] Matrix Addition, Subtraction and Scalar Multiplication in Python numpy arrays enable operations like these using the normal addition, subtraction and multiplication operators and without writing for loops: In [5]: A = np.array([[2, 4, 0], [1, 3, 2]]) B = np.array([[1, 0, 5], [2, 3, 2]]) A + B Out[5]: array([[3, 4, 5], [3, 6, 4]]) Similarly, operations with scalars are also applied elementwise: In [6]: 2 * A Out[6]: array([[4, 8, 0], [2, 6, 4]]) Matrix Multiplication AB A B A We can compute , the result of multiplying matrices and , provided the number of columns of equals the number of rows of B A m×p B p×n C m×n Ci,j column of B and summing: C = AB If is a matrix and is a matrix, then we can compute will be a matrix is obtained by multiplying elements of the th row of by corresponding elements of the th E.g. i A p Ci,j = ∑ Ai,k Bk,j k=1  3 1 2  2 4 0 A = [ 1 3 2 ] B =  2 3 1  AB = [ 1411 1416 118 ]  1 3 3 j Since vectors are just onecolumn vectors, matrix multiplication can apply — provided the dimensions are OK, e.g.  2  2 4 0 A = [ 1 3 2 ] x =  3  y = [ 23 ] Ax = [ 1613 ] Ay Ay is undefined  1 Matrix Multiplication in Python numpy offers dot as a function or method for matrix multiplication: In [7]: A = np.array([[2, 4, 0], [1, 3, 2]]) B = np.array([[3, 1, 2], [2, 3, 1], [1, 3, 3]]) # Multiplication as a function # np.dot(A, B) # Multiplication as a method A.dot(B) Out[7]: array([[14, 14, 8], [11, 16, 11]]) Be warned that A * B on numpy arrays is not the same as matrix multiplication as defined above. Instead, it is performed elementwise on corresponding elements of two equalsized arrays. Transpose T m × n matrix A , written A , is the n × m matrix in which the first row of A becomes the first T column of A , the second row of A becomes the second column of B , and so on. ATi,j = Aj,i E.g.  2 1  2 4 0 T A = [ 1 3 2] A =  4 3  0 2 T As a special case, if x is a mdimensional column vector (m × 1), then x is a mdimensional row vector (1 × m), e.g.  2  x =  4  x T = [2,4,3]  3 The transpose of Transpose in Python numpy arrays offer easy ways to compute their transpose: either the transpose method or the T attribute: In [8]: A = np.array([[2, 4, 0], [1, 3, 2]]) # Transpose as a method # A.transpose() # Tranpose as an attribute A.T Out[8]: array([[2, 1], [4, 3], [0, 2]]) Identity Matrices n × n identity matrix, I n, contains zeros except for entries on the main diagonal (from top left to bottom The right). I n[i,i] = 1 for i = 1,…,n and I n[i,j] = 0 for i ≠ j, e.g.  1 0 0  I3 =  0 1 0   0 0 1 If A is an m × n matrix then, AI A In = I m A = A Inverses A n × n matrix, then its inverse, A−1 (if it has one) is also a n × n matrix such that AA A A−1 = I n .  1 0 2   −11 2 2  A =  2 −1 3  A−1 =  −4 0 1   4 1 8  6 −1 −1  Some n × n matrices do not have inverses, e.g.  1 1 1   1 1 1  1 1 1 If is a In these cases, provided the matrix is square, you can compute a pseudoinverse, which you can use for some of the same purposes instead Inverses in Python numpy.linalg offers function inv for computing inverses, but also function pinv for computing the MoorePenrose pseudoinverse: In [9]: A = np.array([[1, 0, 2], [2, -1, 3], [4, 1, 8]]) npla.inv(A) Out[9]: array([[ -1.10000000e+01, [ -4.00000000e+00, [ 6.00000000e+00, 2.00000000e+00, -2.96059473e-16, -1.00000000e+00, 2.00000000e+00], 1.00000000e+00], -1.00000000e+00]]) 2.00000000e+00, 1.41990266e-14, -1.00000000e+00, 2.00000000e+00], 1.00000000e+00], -1.00000000e+00]]) In [10]: npla.pinv(A) Out[10]: array([[ -1.10000000e+01, [ -4.00000000e+00, [ 6.00000000e+00, In [11]: B = np.ones((3,3)) npla.inv(B) # raises an exception -----------------------------------------------------------LinAlgError Traceback (most recent call last) <ipython-input-11-68bf09415942> in <module>() 1 B = np.ones((3,3)) 2 ----> 3 npla.inv(B) # raises an exception /usr/local/lib/python3.5/dist-packages/numpy/linalg/linalg.py in inv(a) 524 signature = 'D->D' if isComplexType(t) else 'd->d' 525 extobj = get_linalg_error_extobj(_raise_linalgerror_singul ar) --> 526 ainv = _umath_linalg.inv(a, signature=signature, extobj=ex tobj) 527 return wrap(ainv.astype(result_t, copy=False)) 528 /usr/local/lib/python3.5/dist-packages/numpy/linalg/linalg.py in _rais e_linalgerror_singular(err, flag) 88 89 def _raise_linalgerror_singular(err, flag): ---> 90 raise LinAlgError("Singular matrix") 91 92 def _raise_linalgerror_nonposdef(err, flag): LinAlgError: Singular matrix In [13]: npla.pinv(B) Out[13]: array([[ 0.11111111, [ 0.11111111, [ 0.11111111, 0.11111111, 0.11111111, 0.11111111, 0.11111111], 0.11111111], 0.11111111]]) Vectorization Algorithms that might otherwise need forloops can often be written much more succinctly by expressing them in terms of matrix operations. More than this, if your programming language has efficient implementations of these operations, the resulting programs can run much faster too. Using fast matrix operations in this way is known as vectorization. numpy's vectorized array operations, for example, are typically one or more orders of magnitude faster than their pure Python equivalents. Here are a few more examples. Evaluating a Linear Equation Suppose you have a linear equation, i.e. of this form: y = β 1x 1 + β 2x 2 + … + β nx n … lots of little multiplications, all added together You know that this can be written in this form: n y = ∑ β ix i i=1 If you had to write code to implement this, you might be thinking of a forloop. x n β n But if we assume that is an dimensional row vector and is an dimensional column vector, then we can instead write the equation in this form: y = xβ and you can implement this with the numpy library's matrix multiplication method. Let's make this more concrete. Suppose we have this linear equation: y = 3x 1 + 2x 2 + 4x 4 x = [5,1,3]  3  We take the coefficients to give us β =  2  and now we compute  4  3  y = [5,1,3]  2  = 29  4 and we want to evaluate it for Evaluating a Linear Equation in Python In [12]: beta = np.array([3, 2, 4]) x = np.array([5, 1, 3]) # Evaluate the linear equation - without a for-loop y = x.dot(beta) # Display y y Out[12]: 29 Evaluating a Linear Equation Multiple Times y = β 1x 1 + β 2x 2 + … + β nx n on a number of different row [3,3,3], and finally for [3,3,5]. Maybe you're thinking of a Suppose we want to evaluate the linear equation vectors, first for , then for , then for forloop again? [5,1,3] [2,6,2] X  5 1 3  X =  23 63 23   3 3 5  y = Xβ  5 1 3   3   29  y =  23 63 23  ×  2  =  2627   3 3 5   4   35  Instead, we put these different row vectors into a single matrix And we simply evaluate Evaluating a Linear Equation Multiple Times in Python In [34]: beta = np.array([3, 2, 4]) X = np.array([[5, 1, 3], [2, 6, 2], [3, 3, 3], [3, 3, 5]]) # Evaluate the linear equation multiple times - without a for-loop! y = X.dot(beta) # Display y y Out[34]: array([29, 26, 27, 35])

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Vectorization - Computer Science