Getting Started with NumPy for Data Science: A Beginner's Guide
A basic beginner's guide to understanding NumPy for Python.
Notes:
Must be familiar with Python
Install NumPy by typing in ‘pip install numpy’ into your command prompt or terminal
I’m using Jupyter Notebook, you can type ‘jupyter notebook’ in your command prompt or terminal to use it
NumPy Arrays
NumPy Arrays come as one-dimensional vectors or two-dimensional matrices but both are called arrays.
We can create a normal list by:
We can cast that list as an array by:
We will get an array back as a container for that list. This returns a one-dimensional array.
If we want a two-dimensional array (matrix) then:
Two sets of brackets would indicate it’s a two-dimensional array and it has three rows and three columns.
Arange
Usually, we will use its own built-in generation method to create arrays because it’s faster. The most common way is to use np.arange() to start and stop. So for example, if I were to start with 0 and end with 10, it’ll look like this:
I will get an array back from 0 to 9. If we wanted to go to 10, we’ll have to write it as np.arange(0,11).
If we want even numbers only then we’ll add a comma 2 at the end, such as:
Zeros
If we want to generate arrays of all zeros we can say np.zeros(). We can pass in as single or as tuples:
The number is going to represent the number of rows and the second number in that tuple is going to represent the number of columns. So if we want two rows and three columns, we would write it as np.zeros((2,3)).
Ones
We can also create arrays of just ones by using np.ones():
Linspace
Another built-in function is np.linspace() and it returns evenly spaced numbers over a specific interval. Linspace will take in a third argument of the number of points I want. So for example, I want to start at 0 and stop at 5 and I want to get 10 evenly spaced points between 0 and 5
This will give me a one-dimensional vector of 10 evenly spaced points from 0 to 5. I can change the third number from 10 to 100 and it’ll return a larger one-dimensional array:
Identity Matrix
Let’s now create an identity matrix. We can say np.eye():
This is useful when dealing with linear algebra problems. It’s a two-dimensional square matrix where everything is zero, but the diagonal is one.
Random Numbers
NumPy has a lot of ways to create arrays of random numbers.
The first one is np.random.rand() which is going to create an array of the given shape you pass in and it’s going to populate it with random samples from a uniform distribution over 0 to 1. So if I wanted a one-dimensional array of random numbers uniformly distributed from 0 to 1, I can pass a single digit:
If I want this to be two-dimensional, I can just pass it in as separate arguments. If I wanted a 5 by 5 matrix of random numbers I can pass it as:
If I wanted to return a sample or many samples from the normal distribution, we will use np.random.randn(). This will return numbers from a normal distribution center around 0. So I can pass in four and I’ll get four random numbers from a normal distribution:
If we wanted a two-dimensional, we’ll pass it as a separate argument just like the last one:
Another one is np.random.randint() and it’ll return random integers from a low to a high number. So if I wanted one random integer I can pass it as:
If I wanted multiple random integers, I can pass in a third argument, so in this case, I want 10 random numbers:
Reshape
One useful method you can use is the reshape method. It’s going to return an array containing the same data in a new shape. So if I wanted to reshape my array, which is 25 digits, I can reshape it as a 5 by 5 array:
Keep in mind that you’ll get an error if you can’t fill up the matrix completely, so I can’t write it as ar.reshape(5,10) because that will require 50 digits. You can check if it works if the number of rows times the number of columns is equal to the actual number that you have created.
Max
If we want to find the max value then we can use the max method and it’ll return the maximum value of that array:
Min
Similarly, we can use the min method to find the minimum number:
Location
We can find the index location of the maximum number and minimum number. We can use argmax() to find the location of the maximum or we can use argmin() to find the minimum number:
So in this case it returns 5 and if we look at array([27, 13, 0, 32, 23, 44, 10, 4, 20, 38]), we can find it’s at the fifth index.
Shape
If we wanted to know the shape of the vector we can use .shape:
Data type
If you want to know the data type, we type in .dtype and it’ll return the actual data type:
Indexing
If you want to find a location of the number then you can use brackets. So in this case, if I wanted the value at index 6 then I’ll pass in a 6 and it’ll return back the value in index 6:
If I wanted to get the values in a range, we can use slice notation meaning there is a starting index and a stop index. So if I wanted to start at 1 and end at 6, I can pass in:
If I wanted to return everything up to a certain number I can just put a colon and a number I want to end at:
So in this case, I want every value up to 8.
We can also do the opposite:
Matrix indexing
We can find the index of a matrix just like the one we did above. So for example, if I wanted to find the index 0 rows in the first column (index 0):
You first pass in the row that you want and then the column you want.
If you want to index the entire row then you can remove the second bracket:
Here are more examples:
We can also use commas instead of double brackets:
If you wanted some matrices from the matrix, you can use a colon for slice notation:
So what it’s saying is we want to grab everything from row 2 and then grab from column 1 onwards.
Boolean
We can combine the array with comparison operators to get a full boolean array. So in this case, I want everything greater than 6:
This returns false if it’s less than 6 and true if it’s greater than 6.
We can use that boolean to index from the original array:
It’ll return everything that is true.
But we can make it simple by just using brackets:
Operations
These are operations that are already used in Python itself.
So if I wanted to add arrays together, I would use the plus sign. If I wanted to subtract arrays, I would use the minus sign. If I wanted to multiply them together I would use the multiply sign.
We can also perform arrays with scalar operations. So if I wanted to add 50, I can write it as ar + 50. If I wanted to multiply by 50 I can write ar * 50:
Sometimes NumPy will give you a warning instead of an error. So if I wanted to divide ar by ar, it’ll give me a warning for 0 (since 0 divided by 0 is an error), but it’ll give me the output for the rest of them.
There are mathematical operations that you can use. So if I wanted to square root an array, I would use no.sqrt(). If I wanted exponential, I can use np.exp(). We can even use trigonometric function such as sin and cosine:
Here is a link to all the universal functions that you can use (there is a lot):
https://numpy.org/doc/stable/reference/ufuncs.html
[End]