Introduction to `numpy`

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

a powerful N-dimensional array object
(sophisticated) broadcasting functions
tools for integrating C/C++ and Fortran code
useful linear algebra, Fourier transform, and random generation capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container for general data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

Library documentation: http://numpy.org/

The base `numpy.array` object

Code

import numpy as np

# declare a vector using a list as the argument
v = np.array([1, 2.0, 3, 4])
v

array([1., 2., 3., 4.])

Code

list([1, 2.0, 3, 4])

[1, 2.0, 3, 4]

Code

type(v)

numpy.ndarray

Code

v.shape

(4,)

Code

v.ndim

Code

v.dtype is float

False

Code

v.dtype

dtype('float64')

Code

np.uint8 is int

False

Tip

Use copilot explain to understand the chunks:

The np.uint8 is a data type in NumPy, representing an unsigned 8-bit integer, which can store values from 0 to 255. The int type is the built-in integer type in Python, which can represent any integer value without a fixed size limit.

Code

np.array([2**120, 2**40], dtype=np.int64)

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
Cell In[9], line 1
----> 1 np.array([2**120, 2**40], dtype=np.int64)

OverflowError: Python int too large to convert to C long

Code

np.uint16 is int

False

Code

np.uint32  is int

False

Code

w = np.array([1.3, 2, 3, 4], dtype=np.int64)
w

array([1, 2, 3, 4])

Code

w.dtype

dtype('int64')

Code

a = np.arange(100)

Code

type(a)

numpy.ndarray

Code

np.array(range(100))

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

Code

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

Code

a.dtype

dtype('int64')

Code

-3 * a ** 2

array([     0,     -3,    -12,    -27,    -48,    -75,   -108,   -147,
         -192,   -243,   -300,   -363,   -432,   -507,   -588,   -675,
         -768,   -867,   -972,  -1083,  -1200,  -1323,  -1452,  -1587,
        -1728,  -1875,  -2028,  -2187,  -2352,  -2523,  -2700,  -2883,
        -3072,  -3267,  -3468,  -3675,  -3888,  -4107,  -4332,  -4563,
        -4800,  -5043,  -5292,  -5547,  -5808,  -6075,  -6348,  -6627,
        -6912,  -7203,  -7500,  -7803,  -8112,  -8427,  -8748,  -9075,
        -9408,  -9747, -10092, -10443, -10800, -11163, -11532, -11907,
       -12288, -12675, -13068, -13467, -13872, -14283, -14700, -15123,
       -15552, -15987, -16428, -16875, -17328, -17787, -18252, -18723,
       -19200, -19683, -20172, -20667, -21168, -21675, -22188, -22707,
       -23232, -23763, -24300, -24843, -25392, -25947, -26508, -27075,
       -27648, -28227, -28812, -29403])

Code

a[42] = 13

Code

a[42] = 1025

Code

np.info(np.int16)

 int16()

Signed integer type, compatible with C ``short``.

:Character code: ``'h'``
:Canonical name: `numpy.short`
:Alias on this platform (Linux x86_64): `numpy.int16`: 16-bit signed integer (``-32_768`` to ``32_767``).


Methods:

  all  --  Scalar method identical to the corresponding array attribute.
  any  --  Scalar method identical to the corresponding array attribute.
  argmax  --  Scalar method identical to the corresponding array attribute.
  argmin  --  Scalar method identical to the corresponding array attribute.
  argsort  --  Scalar method identical to the corresponding array attribute.
  astype  --  Scalar method identical to the corresponding array attribute.
  bit_count  --  int16.bit_count() -> int
  byteswap  --  Scalar method identical to the corresponding array attribute.
  choose  --  Scalar method identical to the corresponding array attribute.
  clip  --  Scalar method identical to the corresponding array attribute.
  compress  --  Scalar method identical to the corresponding array attribute.
  conj  --  None
  conjugate  --  Scalar method identical to the corresponding array attribute.
  copy  --  Scalar method identical to the corresponding array attribute.
  cumprod  --  Scalar method identical to the corresponding array attribute.
  cumsum  --  Scalar method identical to the corresponding array attribute.
  diagonal  --  Scalar method identical to the corresponding array attribute.
  dump  --  Scalar method identical to the corresponding array attribute.
  dumps  --  Scalar method identical to the corresponding array attribute.
  fill  --  Scalar method identical to the corresponding array attribute.
  flatten  --  Scalar method identical to the corresponding array attribute.
  getfield  --  Scalar method identical to the corresponding array attribute.
  is_integer  --  integer.is_integer() -> bool
  item  --  Scalar method identical to the corresponding array attribute.
  max  --  Scalar method identical to the corresponding array attribute.
  mean  --  Scalar method identical to the corresponding array attribute.
  min  --  Scalar method identical to the corresponding array attribute.
  nonzero  --  Scalar method identical to the corresponding array attribute.
  prod  --  Scalar method identical to the corresponding array attribute.
  put  --  Scalar method identical to the corresponding array attribute.
  ravel  --  Scalar method identical to the corresponding array attribute.
  repeat  --  Scalar method identical to the corresponding array attribute.
  reshape  --  Scalar method identical to the corresponding array attribute.
  resize  --  Scalar method identical to the corresponding array attribute.
  round  --  Scalar method identical to the corresponding array attribute.
  searchsorted  --  Scalar method identical to the corresponding array attribute.
  setfield  --  Scalar method identical to the corresponding array attribute.
  setflags  --  Scalar method identical to the corresponding array attribute.
  sort  --  Scalar method identical to the corresponding array attribute.
  squeeze  --  Scalar method identical to the corresponding array attribute.
  std  --  Scalar method identical to the corresponding array attribute.
  sum  --  Scalar method identical to the corresponding array attribute.
  swapaxes  --  Scalar method identical to the corresponding array attribute.
  take  --  Scalar method identical to the corresponding array attribute.
  to_device  --  None
  tobytes  --  None
  tofile  --  Scalar method identical to the corresponding array attribute.
  tolist  --  Scalar method identical to the corresponding array attribute.
  tostring  --  Scalar method identical to the corresponding array attribute.
  trace  --  Scalar method identical to the corresponding array attribute.
  transpose  --  Scalar method identical to the corresponding array attribute.
  var  --  Scalar method identical to the corresponding array attribute.
  view  --  Scalar method identical to the corresponding array attribute.

Code

np.int16

numpy.int16

Code

dict(enumerate(a))

{0: np.int64(0),
 1: np.int64(1),
 2: np.int64(2),
 3: np.int64(3),
 4: np.int64(4),
 5: np.int64(5),
 6: np.int64(6),
 7: np.int64(7),
 8: np.int64(8),
 9: np.int64(9),
 10: np.int64(10),
 11: np.int64(11),
 12: np.int64(12),
 13: np.int64(13),
 14: np.int64(14),
 15: np.int64(15),
 16: np.int64(16),
 17: np.int64(17),
 18: np.int64(18),
 19: np.int64(19),
 20: np.int64(20),
 21: np.int64(21),
 22: np.int64(22),
 23: np.int64(23),
 24: np.int64(24),
 25: np.int64(25),
 26: np.int64(26),
 27: np.int64(27),
 28: np.int64(28),
 29: np.int64(29),
 30: np.int64(30),
 31: np.int64(31),
 32: np.int64(32),
 33: np.int64(33),
 34: np.int64(34),
 35: np.int64(35),
 36: np.int64(36),
 37: np.int64(37),
 38: np.int64(38),
 39: np.int64(39),
 40: np.int64(40),
 41: np.int64(41),
 42: np.int64(1025),
 43: np.int64(43),
 44: np.int64(44),
 45: np.int64(45),
 46: np.int64(46),
 47: np.int64(47),
 48: np.int64(48),
 49: np.int64(49),
 50: np.int64(50),
 51: np.int64(51),
 52: np.int64(52),
 53: np.int64(53),
 54: np.int64(54),
 55: np.int64(55),
 56: np.int64(56),
 57: np.int64(57),
 58: np.int64(58),
 59: np.int64(59),
 60: np.int64(60),
 61: np.int64(61),
 62: np.int64(62),
 63: np.int64(63),
 64: np.int64(64),
 65: np.int64(65),
 66: np.int64(66),
 67: np.int64(67),
 68: np.int64(68),
 69: np.int64(69),
 70: np.int64(70),
 71: np.int64(71),
 72: np.int64(72),
 73: np.int64(73),
 74: np.int64(74),
 75: np.int64(75),
 76: np.int64(76),
 77: np.int64(77),
 78: np.int64(78),
 79: np.int64(79),
 80: np.int64(80),
 81: np.int64(81),
 82: np.int64(82),
 83: np.int64(83),
 84: np.int64(84),
 85: np.int64(85),
 86: np.int64(86),
 87: np.int64(87),
 88: np.int64(88),
 89: np.int64(89),
 90: np.int64(90),
 91: np.int64(91),
 92: np.int64(92),
 93: np.int64(93),
 94: np.int64(94),
 95: np.int64(95),
 96: np.int64(96),
 97: np.int64(97),
 98: np.int64(98),
 99: np.int64(99)}

Code

a + 1

array([   1,    2,    3,    4,    5,    6,    7,    8,    9,   10,   11,
         12,   13,   14,   15,   16,   17,   18,   19,   20,   21,   22,
         23,   24,   25,   26,   27,   28,   29,   30,   31,   32,   33,
         34,   35,   36,   37,   38,   39,   40,   41,   42, 1026,   44,
         45,   46,   47,   48,   49,   50,   51,   52,   53,   54,   55,
         56,   57,   58,   59,   60,   61,   62,   63,   64,   65,   66,
         67,   68,   69,   70,   71,   72,   73,   74,   75,   76,   77,
         78,   79,   80,   81,   82,   83,   84,   85,   86,   87,   88,
         89,   90,   91,   92,   93,   94,   95,   96,   97,   98,   99,
        100])

Code

b = a + 1
b

array([   1,    2,    3,    4,    5,    6,    7,    8,    9,   10,   11,
         12,   13,   14,   15,   16,   17,   18,   19,   20,   21,   22,
         23,   24,   25,   26,   27,   28,   29,   30,   31,   32,   33,
         34,   35,   36,   37,   38,   39,   40,   41,   42, 1026,   44,
         45,   46,   47,   48,   49,   50,   51,   52,   53,   54,   55,
         56,   57,   58,   59,   60,   61,   62,   63,   64,   65,   66,
         67,   68,   69,   70,   71,   72,   73,   74,   75,   76,   77,
         78,   79,   80,   81,   82,   83,   84,   85,   86,   87,   88,
         89,   90,   91,   92,   93,   94,   95,   96,   97,   98,   99,
        100])

Code

a is b

False

Code

f = id(a)
a += 1
f, id(a)

(134005493570928, 134005493570928)

Code

array([   1,    2,    3,    4,    5,    6,    7,    8,    9,   10,   11,
         12,   13,   14,   15,   16,   17,   18,   19,   20,   21,   22,
         23,   24,   25,   26,   27,   28,   29,   30,   31,   32,   33,
         34,   35,   36,   37,   38,   39,   40,   41,   42, 1026,   44,
         45,   46,   47,   48,   49,   50,   51,   52,   53,   54,   55,
         56,   57,   58,   59,   60,   61,   62,   63,   64,   65,   66,
         67,   68,   69,   70,   71,   72,   73,   74,   75,   76,   77,
         78,   79,   80,   81,   82,   83,   84,   85,   86,   87,   88,
         89,   90,   91,   92,   93,   94,   95,   96,   97,   98,   99,
        100])

Code

array([   1,    2,    3,    4,    5,    6,    7,    8,    9,   10,   11,
         12,   13,   14,   15,   16,   17,   18,   19,   20,   21,   22,
         23,   24,   25,   26,   27,   28,   29,   30,   31,   32,   33,
         34,   35,   36,   37,   38,   39,   40,   41,   42, 1026,   44,
         45,   46,   47,   48,   49,   50,   51,   52,   53,   54,   55,
         56,   57,   58,   59,   60,   61,   62,   63,   64,   65,   66,
         67,   68,   69,   70,   71,   72,   73,   74,   75,   76,   77,
         78,   79,   80,   81,   82,   83,   84,   85,   86,   87,   88,
         89,   90,   91,   92,   93,   94,   95,   96,   97,   98,   99,
        100])

Warning

Beware of the dimensions: a 1D array is not the same as a 2D array with 1 column

Code

a1 = np.array([1, 2, 3])
print(a1, a1.shape, a1.ndim)

[1 2 3] (3,) 1

Code

a2 = np.array([1, 2, 3])
print(a2, a2.shape, a2.ndim)

[1 2 3] (3,) 1

Matrix multiplication

Code

a2.dot(a1) # inner product

np.int64(14)

Code

( 
    np.array([a2])
        .transpose() # column vector
        .dot(np.array([a1]))
) # column vector multiplied by row vector

array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])

Code

(
    np.array([a2])
    .transpose()#.shape
)

array([[1],
       [2],
       [3]])

Code

(
    a2.reshape(3,1)  # all explicit
      .dot(a1.reshape(1, 3))
)

array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])

Code

# Declare a 2D array using a nested list as the constructor argument
M = np.array([[1,2], 
              [3,4], 
              [3.14, -9.17]])
M

array([[ 1.  ,  2.  ],
       [ 3.  ,  4.  ],
       [ 3.14, -9.17]])

Code

M.shape, M.size

((3, 2), 6)

Code

M.ravel(), M.ndim, M.ravel().shape

(array([ 1.  ,  2.  ,  3.  ,  4.  ,  3.14, -9.17]), 2, (6,))

Code

# arguments: start, stop, step
x = (
     np.arange(12)
       .reshape(4, 3)
)
x

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

Code

y = np.arange(3).reshape(3,1)

y

array([[0],
       [1],
       [2]])

Code

x @ y, x.dot(y)

(array([[ 5],
        [14],
        [23],
        [32]]),
 array([[ 5],
        [14],
        [23],
        [32]]))

Note

Generating arrays

Code

np.linspace(0, 10, 51)  # meaning of the 3 positional parameters ?

array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ,  1.2,  1.4,  1.6,  1.8,  2. ,
        2.2,  2.4,  2.6,  2.8,  3. ,  3.2,  3.4,  3.6,  3.8,  4. ,  4.2,
        4.4,  4.6,  4.8,  5. ,  5.2,  5.4,  5.6,  5.8,  6. ,  6.2,  6.4,
        6.6,  6.8,  7. ,  7.2,  7.4,  7.6,  7.8,  8. ,  8.2,  8.4,  8.6,
        8.8,  9. ,  9.2,  9.4,  9.6,  9.8, 10. ])

Code

np.logspace(0, 10, 11, base=np.e), np.e**(np.arange(11))

(array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
        5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
        2.98095799e+03, 8.10308393e+03, 2.20264658e+04]),
 array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
        5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
        2.98095799e+03, 8.10308393e+03, 2.20264658e+04]))

Code

import matplotlib.pyplot as plt

# Random standard Gaussian numbers
fig = plt.figure(figsize=(8, 4))
wn = np.random.randn(1000)
bm = wn.cumsum()

plt.plot(bm, lw=3)

Code

np.diag(np.arange(10))

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 3, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 4, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 5, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 6, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 7, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 8, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 9]])

Code

zozo = np.zeros((10, 10), dtype=np.float32)
zozo

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)

Code

zozo.shape

(10, 10)

Code

print(M)

[[ 1.    2.  ]
 [ 3.    4.  ]
 [ 3.14 -9.17]]

Code

M[1, 1]

np.float64(4.0)

Code

# assign new value
M[0, 0] = 7
M[:, 0] = 42
M

array([[42.  ,  2.  ],
       [42.  ,  4.  ],
       [42.  , -9.17]])

Code

array([[42.  ,  2.  ],
       [42.  ,  4.  ],
       [42.  , -9.17]])

Code

# Warning: the next m is a **view** on M. 
# One again, no copies unless you ask for one!
m = M[0, :]
m

array([42.,  2.])

Code

m[:] = 3.14
M

array([[ 3.14,  3.14],
       [42.  ,  4.  ],
       [42.  , -9.17]])

Code

m[:] = 7
M

array([[ 7.  ,  7.  ],
       [42.  ,  4.  ],
       [42.  , -9.17]])

Slicing

Code

# slicing works just like with anything else (lists, etc.)
A = np.array([1, 2, 3, 4, 5])
print(A)
print(A[::-1])
print(A[::2])
print(A[:-1:2])

[1 2 3 4 5]
[5 4 3 2 1]
[1 3 5]
[1 3]

Code

[[n + m * 10 for n in range(5)] for m in range(5)]

[[0, 1, 2, 3, 4],
 [10, 11, 12, 13, 14],
 [20, 21, 22, 23, 24],
 [30, 31, 32, 33, 34],
 [40, 41, 42, 43, 44]]

Code

A = np.array([[n + m * 10 for n in range(5)] for m in range(5)])
A

array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

Code

print(A[1:4])

[[10 11 12 13 14]
 [20 21 22 23 24]
 [30 31 32 33 34]]

Code

m = A[:, 1:4]

Code

m[1, 1] = 123

Code

array([[  0,   1,   2,   3,   4],
       [ 10,  11, 123,  13,  14],
       [ 20,  21,  22,  23,  24],
       [ 30,  31,  32,  33,  34],
       [ 40,  41,  42,  43,  44]])

Code

A[1]

array([ 10,  11, 123,  13,  14])

Code

A[:, 1]

array([ 1, 11, 21, 31, 41])

Code

A[:, ::-1]

array([[  4,   3,   2,   1,   0],
       [ 14,  13, 123,  11,  10],
       [ 24,  23,  22,  21,  20],
       [ 34,  33,  32,  31,  30],
       [ 44,  43,  42,  41,  40]])

Code

print(A)

[[  0   1   2   3   4]
 [ 10  11 123  13  14]
 [ 20  21  22  23  24]
 [ 30  31  32  33  34]
 [ 40  41  42  43  44]]

Code

row_indices = np.array([1, 2, 4])
print(A[row_indices])

[[ 10  11 123  13  14]
 [ 20  21  22  23  24]
 [ 40  41  42  43  44]]

Code

A[:, row_indices]

array([[  1,   2,   4],
       [ 11, 123,  14],
       [ 21,  22,  24],
       [ 31,  32,  34],
       [ 41,  42,  44]])

Another way is through masking with an array of bools

Code

# index masking
B = np.arange(5)
row_mask = np.array([True, False, True, False, False])
print(B)
print(B[row_mask])

[0 1 2 3 4]
[0 2]

Code

A, A[row_mask] , A[:,row_mask]

(array([[  0,   1,   2,   3,   4],
        [ 10,  11, 123,  13,  14],
        [ 20,  21,  22,  23,  24],
        [ 30,  31,  32,  33,  34],
        [ 40,  41,  42,  43,  44]]),
 array([[ 0,  1,  2,  3,  4],
        [20, 21, 22, 23, 24]]),
 array([[  0,   2],
        [ 10, 123],
        [ 20,  22],
        [ 30,  32],
        [ 40,  42]]))

Copies

Don’t forget that python does not make copies unless told to do so (same as with any mutable type)

If you are not careful enough, this typically leads to a lot of errors and to being fired !!

Code

y = x = np.arange(6)
x[2] = 123
y

array([  0,   1, 123,   3,   4,   5])

Code

x is y

True

Code

# A real copy
y = x.copy()
x is y

False

Code

# Or equivalently (but the one above is better...)
y = np.copy(x)

Code

x[0] = -12
print(x, y, x is y)

[-12   1 123   3   4   5] [  0   1 123   3   4   5] False

To put values of x in y (copy values into an existing array) use

Code

x = np.random.randn(10)
x, id(x)

(array([ 0.38046554,  0.74957713, -1.24302298,  0.63151793, -2.31400049,
        -0.16159741,  0.83592963,  2.16443975, -0.05600288, -0.70457148]),
 134005491205392)

Code

x.fill(2.78)   # in place. 
x, id(x)

(array([2.78, 2.78, 2.78, 2.78, 2.78, 2.78, 2.78, 2.78, 2.78, 2.78]),
 134005491205392)

Code

x[:] = 3.14  # x.fill(3.14)  can. be chained ...
x, id(x)

(array([3.14, 3.14, 3.14, 3.14, 3.14, 3.14, 3.14, 3.14, 3.14, 3.14]),
 134005491205392)

Code

x[:] = np.random.randn(x.shape[0])
x, id(x)

(array([ 0.62933532, -1.23653201,  0.13433891,  0.40760153,  1.05453221,
        -0.24900888,  1.15975529,  0.0703507 ,  0.64917964,  0.71881338]),
 134005491205392)

Code

y = np.empty(x.shape)  # how does empty() work ?
y, id(y)

(array([0.62933532, 1.23653201, 0.13433891, 0.40760153, 1.05453221,
        0.24900888, 1.15975529, 0.0703507 , 0.64917964, 0.71881338]),
 134005491205200)

Code

y = x
y, id(y), id(x), y is x

(array([ 0.62933532, -1.23653201,  0.13433891,  0.40760153,  1.05453221,
        -0.24900888,  1.15975529,  0.0703507 ,  0.64917964,  0.71881338]),
 134005491205392,
 134005491205392,
 True)

Final warning

In the next line you copy the values of x into an existing array y (of same size…)

Code

y = np.zeros(x.shape)
y[:] = x
y, y is x, np.all(y==x)

(array([ 0.62933532, -1.23653201,  0.13433891,  0.40760153,  1.05453221,
        -0.24900888,  1.15975529,  0.0703507 ,  0.64917964,  0.71881338]),
 False,
 np.True_)

While in the next line, you are aliasing, you are giving a new name y to the object named x (you should never, ever write something like this)

Code

y = x
y is x

True

Miscellanea

Non-numerical values

A numpy array can contain other things than numeric types

Code

arr = np.array(['Labore', 'neque', 'ipsum', 'ut', 'non', 'quiquia', 'dolore.'])
arr, arr.shape, arr.dtype

(array(['Labore', 'neque', 'ipsum', 'ut', 'non', 'quiquia', 'dolore.'],
       dtype='<U7'),
 (7,),
 dtype('<U7'))

Code

# arr.sum()

Code

"_".join(arr)

'Labore_neque_ipsum_ut_non_quiquia_dolore.'

Code

arr.dtype

dtype('<U7')

A matrix is no 2D array in `numpy`

So far, we have only used array or ndarray objects

The is another type: the matrix type

In words: don’t use it (IMhO) and stick with arrays

Code

# Matrix VS array objects in numpy
m1 = np.matrix(np.arange(3))
m2 = np.matrix(np.arange(3))
m1, m2

(matrix([[0, 1, 2]]), matrix([[0, 1, 2]]))

Code

m1.transpose() @ m2, m1.shape, m1.transpose() * m2

(matrix([[0, 0, 0],
         [0, 1, 2],
         [0, 2, 4]]),
 (1, 3),
 matrix([[0, 0, 0],
         [0, 1, 2],
         [0, 2, 4]]))

Code

a1 = np.arange(3)
a2 = np.arange(3)
a1, a2

(array([0, 1, 2]), array([0, 1, 2]))

Code

m1 * m2.T, m1.dot(m2.T)

(matrix([[5]]), matrix([[5]]))

Code

a1 * a2

array([0, 1, 4])

Code

a1.dot(a2)

np.int64(5)

Code

np.outer(a1, a2)

array([[0, 0, 0],
       [0, 1, 2],
       [0, 2, 4]])

Note

Visit https://numpy.org/doc/stable/reference/arrays.ndarray.html#arrays-ndarray

Sparse matrices

Code

from scipy.sparse import csc_matrix, csr_matrix, coo_matrix

Code

probs = np.full(fill_value=1/4, shape=(4,))
probs

array([0.25, 0.25, 0.25, 0.25])

Code

X = np.random.multinomial(n=2, pvals=probs, size=4)   # check you understand what is going on 
X

array([[0, 0, 0, 2],
       [0, 0, 1, 1],
       [1, 1, 0, 0],
       [1, 0, 0, 1]])

Code

probs

array([0.25, 0.25, 0.25, 0.25])

Code

X_coo = coo_matrix(X)  ## coordinate format

Code

print(X_coo)
X_coo

<COOrdinate sparse matrix of dtype 'int64'
    with 7 stored elements and shape (4, 4)>
  Coords    Values
  (0, 3)    2
  (1, 2)    1
  (1, 3)    1
  (2, 0)    1
  (2, 1)    1
  (3, 0)    1
  (3, 3)    1

<COOrdinate sparse matrix of dtype 'int64'
    with 7 stored elements and shape (4, 4)>

Code

X_coo.nnz    # number pf non-zero coordinates

Code

print(X, end='\n----\n')
print(X_coo.data, end='\n----\n')
print(X_coo.row, end='\n----\n')
print(X_coo.col, end='\n----\n')

[[0 0 0 2]
 [0 0 1 1]
 [1 1 0 0]
 [1 0 0 1]]
----
[2 1 1 1 1 1 1]
----
[0 1 1 2 2 3 3]
----
[3 2 3 0 1 0 3]
----

There is also

csr_matrix: sparse rows format
csc_matrix: sparse columns format

Sparse rows is often used for machine learning: sparse features vectors

But sparse column format useful as well (e.g. coordinate gradient descent)

Bored with decimals?

Code

X = np.random.randn(5, 5)
X

array([[ 0.83342913,  0.08040693, -1.07220735,  0.87379811,  0.59755824],
       [ 0.32755901,  0.10432441,  0.17615545,  0.07566302, -0.02653956],
       [ 0.19095091, -0.71199912,  0.20379812, -1.08511624, -0.50568562],
       [ 0.21171716, -0.74242228, -0.37705104,  0.91580081, -1.48349099],
       [-0.66164491,  1.00021721,  0.93630598,  0.22694503,  1.17406202]])

Code

# All number displayed by numpy (in the current kernel) are with 3 decimals max
np.set_printoptions(precision=3)
print(X)
np.set_printoptions(precision=8)

[[ 0.833  0.08  -1.072  0.874  0.598]
 [ 0.328  0.104  0.176  0.076 -0.027]
 [ 0.191 -0.712  0.204 -1.085 -0.506]
 [ 0.212 -0.742 -0.377  0.916 -1.483]
 [-0.662  1.     0.936  0.227  1.174]]

Not limited to 2D!

numpy arrays can have any number of dimension (hence the name ndarray)

Code

X = np.arange(18).reshape(3, 2, 3)
X

array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]],

       [[12, 13, 14],
        [15, 16, 17]]])

Code

X.shape

(3, 2, 3)

Code

X.ndim

Visit https://numpy.org/doc/stable/reference/arrays.ndarray.html#arrays-ndarray

Aggregations and statistics

Code

A = np.arange(42).reshape(7, 6)
A

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35],
       [36, 37, 38, 39, 40, 41]])

Code

A.sum(), 42 * 41 //2

(np.int64(861), 861)

Code

A[:, 3].mean(), np.mean (3 + np.arange(0, 42, 6))

(np.float64(21.0), np.float64(21.0))

Code

A.mean(axis=0)

array([18., 19., 20., 21., 22., 23.])

Code

A.mean(axis=1)

array([ 2.5,  8.5, 14.5, 20.5, 26.5, 32.5, 38.5])

Code

A[:,3].std(), A[:,3].var()

(np.float64(12.0), np.float64(144.0))

Code

A[:,3].min(), A[:,3].max()

(np.int64(3), np.int64(39))

Code

A.cumsum(axis=0)

array([[  0,   1,   2,   3,   4,   5],
       [  6,   8,  10,  12,  14,  16],
       [ 18,  21,  24,  27,  30,  33],
       [ 36,  40,  44,  48,  52,  56],
       [ 60,  65,  70,  75,  80,  85],
       [ 90,  96, 102, 108, 114, 120],
       [126, 133, 140, 147, 154, 161]])

Code

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35],
       [36, 37, 38, 39, 40, 41]])

Code

# sum of diagonal
A.trace()

np.int64(105)

Linear Algebra

Code

A = np.arange(30).reshape(6, 5)
v1 = np.arange(0, 5)
v2 = np.arange(5, 10)

Code

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])

Code

v1, v2

(array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9]))

Code

v1 * v2

array([ 0,  6, 14, 24, 36])

Code

v1.dot(v2), np.sum(v1* v2)

(np.int64(80), np.int64(80))

Code

v1.reshape(5,1) @ v2.reshape(1,5)

array([[ 0,  0,  0,  0,  0],
       [ 5,  6,  7,  8,  9],
       [10, 12, 14, 16, 18],
       [15, 18, 21, 24, 27],
       [20, 24, 28, 32, 36]])

Inner products

Code

# Inner product between vectors
print(v1.dot(v2))

# You can use also (but first solution is better)
print(np.dot(v1, v2))

80
80

Code

A, v1

(array([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]]),
 array([0, 1, 2, 3, 4]))

Code

A.shape, v1.shape

((6, 5), (5,))

Code

# Matrix-vector inner product
A.dot(v1)

array([ 30,  80, 130, 180, 230, 280])

Code

# Transpose
A.T

array([[ 0,  5, 10, 15, 20, 25],
       [ 1,  6, 11, 16, 21, 26],
       [ 2,  7, 12, 17, 22, 27],
       [ 3,  8, 13, 18, 23, 28],
       [ 4,  9, 14, 19, 24, 29]])

Code

print(v1)
# Inline operations (same for *=, /=, -=)
v1 += 2

[0 1 2 3 4]

Linear systems

Code

A = np.array([[42,2,3], [4,5,6], [7,8,9]])
b = np.array([1,2,3])
print(A, b, sep=2 * '\n')

[[42  2  3]
 [ 4  5  6]
 [ 7  8  9]]

[1 2 3]

Code

# solve a system of linear equations
x = np.linalg.solve(A, b)
x

array([2.18366847e-18, 2.31698718e-16, 3.33333333e-01])

Code

A.dot(x)

array([1., 2., 3.])

Eigenvalues and eigenvectors

Code

A = np.random.rand(3,3)
B = np.random.rand(3,3)

evals, evecs = np.linalg.eig(A)
evals

array([ 1.30118494,  0.68860835, -0.46662428])

Code

evecs

array([[-0.81651435, -0.96384507, -0.42136945],
       [-0.40428605,  0.13465887, -0.55860129],
       [-0.41213723,  0.22993406,  0.71443152]])

Singular value decomposition (SVD)

Decomposes any matrix \(A \in \mathbb R^{m \times n}\) as follows: \[ A = U \times S \times V^\top \] where - \(U\) and \(V\) are orthonormal matrices (meaning that \(U^\top \times U = I\) and \(V^\top \times V = I\)) - \(S\) is a diagonal matrix that contains the singular values in non-increasing order

Code

print(A)
U, S, V = np.linalg.svd(A)

[[0.88413509 0.01677612 0.80979055]
 [0.11948734 0.34283133 0.70337219]
 [0.03862284 0.94649455 0.29620259]]

Code

U.dot(np.diag(S)).dot(V)

array([[0.88413509, 0.01677612, 0.80979055],
       [0.11948734, 0.34283133, 0.70337219],
       [0.03862284, 0.94649455, 0.29620259]])

Code

A - U @ np.diag(S) @ V

array([[ 4.44089210e-16,  1.90819582e-16, -6.66133815e-16],
       [-2.77555756e-17,  2.77555756e-16,  1.11022302e-16],
       [-5.48172618e-16,  1.11022302e-16,  2.22044605e-16]])

Code

# U and V are indeed orthonormal
np.set_printoptions(precision=2)
print(U.T.dot(U), V.T.dot(V), sep=2 * '\n')
np.set_printoptions(precision=8)

[[ 1.00e+00 -1.27e-17  5.56e-17]
 [-1.27e-17  1.00e+00 -1.55e-17]
 [ 5.56e-17 -1.55e-17  1.00e+00]]

[[ 1.00e+00  5.45e-17  1.05e-16]
 [ 5.45e-17  1.00e+00 -6.47e-17]
 [ 1.05e-16 -6.47e-17  1.00e+00]]

Exercice: the racoon SVD

Load the racoon face picture using scipy.misc.face()
Visualize the picture
Write a function which reshapes the picture into a 2D array, and computes the best rank-r approximation of it (the prototype of the function is compute_approx(X, r)
Display the different approximations for r between 5 and 100

Code

!pip3 install pooch

Requirement already satisfied: pooch in /home/boucheron/sandbox/IFEBY310/.venv/lib/python3.12/site-packages (1.8.2)
Requirement already satisfied: platformdirs>=2.5.0 in /home/boucheron/sandbox/IFEBY310/.venv/lib/python3.12/site-packages (from pooch) (4.3.6)
Requirement already satisfied: packaging>=20.0 in /home/boucheron/sandbox/IFEBY310/.venv/lib/python3.12/site-packages (from pooch) (24.2)
Requirement already satisfied: requests>=2.19.0 in /home/boucheron/sandbox/IFEBY310/.venv/lib/python3.12/site-packages (from pooch) (2.32.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/boucheron/sandbox/IFEBY310/.venv/lib/python3.12/site-packages (from requests>=2.19.0->pooch) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /home/boucheron/sandbox/IFEBY310/.venv/lib/python3.12/site-packages (from requests>=2.19.0->pooch) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/boucheron/sandbox/IFEBY310/.venv/lib/python3.12/site-packages (from requests>=2.19.0->pooch) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/boucheron/sandbox/IFEBY310/.venv/lib/python3.12/site-packages (from requests>=2.19.0->pooch) (2024.12.14)

Code

import numpy as np
from scipy.datasets import face
import matplotlib.pyplot as plt
%matplotlib inline

X = face()

Code

type(X)

numpy.ndarray

Code

plt.imshow(X)
_ = plt.axis('off')

Code

n_rows, n_cols, n_channels = X.shape
X_reshaped = X.reshape(n_rows, n_cols * n_channels)
U, S, V = np.linalg.svd(X_reshaped, full_matrices=False)

Code

X_reshaped.shape

(768, 3072)

Code

X.shape

(768, 1024, 3)

Code

plt.plot(S**2)  ## a kind of screeplot
plt.yscale("log")

Code

def compute_approx(X: np.ndarray, r: int):
    """Computes the best rank-r approximation of X using SVD.
    We expect X to the 3D array corresponding to a color image, that we 
    reduce to a 2D one to apply SVD (no broadcasting).
    
    Parameters
    ----------
    X : `np.ndarray`, shape=(n_rows, n_cols, 3)
        The input 3D ndarray
    
    r : `int`
        The desired rank
        
    Return
    ------
    output : `np.ndarray`, shape=(n_rows, n_cols, 3)
        The best rank-r approximation of X
    """
    n_rows, n_cols, n_channels = X.shape
    # Reshape X to a 2D array
    X_reshape = X.reshape(n_rows, n_cols * n_channels)
    # Compute SVD
    U, S, V = np.linalg.svd(X_reshape, full_matrices=False)
    # Keep only the top r first singular values
    S[r:] = 0
    # Compute the approximation
    X_reshape_r = U.dot(np.diag(S)).dot(V)
    # Put it between 0 and 255 again and cast to integer type
    return X_reshape_r.clip(min=0, max=255).astype('int')\
        .reshape(n_rows, n_cols, n_channels)

Code

ranks = [100, 70, 50, 30, 10, 5]
n_ranks = len(ranks)
for i, r in enumerate(ranks):
    X_r = compute_approx(X, r)
    # plt.subplot(n_ranks, 1, i + 1)
    plt.figure(figsize=(5, 5))
    plt.imshow(X_r)
    _ = plt.axis('off')
    # plt.title(f'Rank {r} approximation of the racoon' % r, fontsize=16)
    plt.tight_layout()

Variations

In the code above, we recompute the SVD of X for every element in list rank.
In the next chunk, we compute the SVD once, and define a generator to generate the low rank approximations of matrix X. We take advantage of the fact that the SVD defines an orthonormal basis for the space of matrices. In this adapted orthonormal basis the optimal low rank approximations of \(X\) have a sparse expansion.

Code

def gen_rank_k_approx(X):
    """Generator for low rank 
    approximation of a matrix X using truncated SVD.

    Args:
        X (numpy.ndarray): a numerical matrix

    Yields:
        (int,numpy.ndarray): rank k and best rank-k approximation of X using truncated SVD(according to Eckart-Young theorem).
    """  
    U, S, V = np.linalg.svd(X, full_matrices=False)
    r = 0
    Y = np.zeros_like(X, dtype='float64')
    while (r<len(S)):
      Y = Y + S[r] * (U[:,r,np.newaxis] @ V[r,:, np.newaxis].T)
      r += 1
      yield r, Y

Code

g = gen_rank_k_approx(X_reshaped)

Code

for i in range(100):
    _, Xr = next(g)
    if i % 10 ==0:  
      plt.figure(figsize=(5, 5))
      plt.imshow(
          Xr
          .clip(min=0, max=255)
          .astype('int')
          .reshape(n_rows, n_cols, n_channels)
      )
      _ = plt.axis('off')
      plt.tight_layout()

Visit https://numpy.org/numpy-tutorials/content/tutorial-svd.html

The base numpy.array object

Matrix multiplication

Generating arrays

Slicing

Copies

Miscellanea

Non-numerical values

A matrix is no 2D array in numpy

Sparse matrices

Bored with decimals?

Not limited to 2D!

Aggregations and statistics

Linear Algebra

Inner products

Linear systems

Eigenvalues and eigenvectors

Singular value decomposition (SVD)

Exercice: the racoon SVD

The base `numpy.array` object

A matrix is no 2D array in `numpy`