Row and Column Vectors

A row vector in classic linear algebra such as (1, -2, 10, 9, 5, -4) can be expressed in Numpy as,

v1_row = np.array([1,-2,10,9,5,-4])[None,:]

Is this an authentic row vector? Yes, as can be shown by getting its shape:

v1_row.shape
(1, 6)

There you have it. This is a 1 X 6 array, or simply a row vector.
Similarly, to get a column vector we can say,

v2_col = np.array([6,-2,-8,4,2,0])[:,None]
v2_col.shape
(6, 1)

This is a 6 X 1 array or a column vector with 6 rows. So why do we need the [None,:] and [:,None] to get “authentic” row and column vectors? That question can be easily answered by…

v3 = np.array([1,-2,10,9,5,-4])
print('V3: ',v3.shape)
V3:  (6,)

v3 is not a column vector in the classical sense. The vector v2_col is a genuine, honest column vector.
I can even use the transpose operator on v2_col to get v4_row:

v4_row = v2_col.T
print(type(v4_row))
v4_row.shape
<class 'numpy.ndarray'>
(1, 6)

So what happens when I try to do the same thing with our bogus v3 vector:

v5 = v3.T
print(type(v5))
print('V5: ',v5.shape)
<class 'numpy.ndarray'>
V5:  (6,)

Nothing. Both v5 and v4_row are official, card carrying members of the ‘numpy.ndarray’ class. But they are not the same type of beast when it comes to transposing. What is the lesson? If you want column or row vectors, then declare them as such as shown previously or easier still use the method below.

v6_row = np.random.rand(1,25)          # 1 row and 25 columns
v7_col = np.random.rand(25,1)          # 25 rows and 1 column

We can even multiply them in the proper order to get a square array 25 X 25:

rnd_array = v7_col * v6_row
print('Shape: ',rnd_array.shape)
print('Size: ',rnd_array.size)
Shape:  (25, 25)
Size:  625

Jupyter Notebook:  Row_and_Column_Vectors

 

Adding third party modules to Python

My Anaconda Windows installation did not have an installable package for “torchvision” and “opencv’.  Typically, out of sheer laziness I would just use the Anaconda GUI, switch to my pyTorch environment and search for the missing pieces.  But they were completely missing.

So I had no choice but to launch the Windows “conda” cli that gets installed whenever you do a Windows Anaconda installation. There I switched to my pyTorch conda environment which I had initially created as “Anaconda3”.  I could have called it “pyTorch”, but that would have been a rational choice and where is the fun in that?

I switched to my pyTorch environment using the “activate Anaconda3” command.

Happily I typed at the prompt:

conda install torchvision

And…nothing.  There was no torchvision package for windows.  In fact, all the torchvision packages available were for either Linux or MacOs  It was still not time for the gnashing of teeth, so as a last resort I tried:

pip install torchvision

And that worked. Fortunately, installing the most stable version of OpenCV wasn’t as traumatic:

conda install opencv

…worked without a pip. While in the conda prompt environment just type, python and at the python prompt…

import torchvision

import cv2

to check that things are ok.

SUMMARY:

There are two places to look for Python modules to add to your Anaconda environment:

  • anacoda.org   -> use conda command line to install
  • pypy.org          -> use pip to install

If you can’t find what you are looking for in either one, then it’s time for the gnashing of teeth.

NOTE:

As a “best practices” sort of thing, I am no longer using the Anaconda Navigator GUI for package installations.  Instead, I first look for the packages in anaconda.org and then install the latest version with the conda cli. The versions listed on the Anaconda Navigator GUI may not be the latest one and it might not even warn you.

Data Flattening

So what is this “data flattening” business?  The short answer is that the input to our neural network is a column vector of dimension n X 1 therefore for the vector dot product to make sense each time we feed one image means we need an input array of dimension m X n.  In our examples m = 64 (batches) and n = 784 (pixels) since the original dimensions of each image is 28 X 28 = 784.

What is a PyTorch DataLoader?

Before grabbing your data it helps to first understand it.  The classic MNIST digit data is composed of   lot grayscale images measuring 28 X 28 pixes along with the labels.  This is important to know because we need to figure out what transforms should be applied as we bring in the data.

The filters variable below is usually labeled “transform”, which doesn’t do it justice: We take raw data and we filter it to suit our needs.  Notice that we grab and transform all in one shot by way of the dataset class.  Usually we would also be grabbing a testset as well like:

testset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=False, transform=filters)

followed by:

testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)

These datasets are used to create the DataLoader which is a Python generator that returns a batch of the data, in this case a batch of 64 images.

Once we have our to DataLoaders, one for training and the other for testing, we are ready for the rest.

The same process is applied to the MNIST fashion data:

Fixing Serious Anaconda Problems under Linux

Occasionally  Anaconda either just stops working or it launches applications so slowly that you could literally brew coffee in the time it takes to fire up Spyder.  That’s when radical steps need to be taken and Anaconda needs to be reinstalled.  Of course, you will have to reinstall pytorch and torchvision, but it’s that doesn’t take long.

Before you uninstall  Anaconda, you must completely get rid of it’s files:

  1. The main folder usually called “anaconda3”.  This is visible.
  2. The two hidden folders .conda and .anaconda

While you’re at it, download the excellent Conda Cheat Sheet:

https://conda.io/docs/_downloads/conda-cheatsheet.pdf