Introduction to Python

We introduce here the python language. Only the bare minimum necessary for getting started with the data-science stack (a bunch of libraries for data science). Python is a programming language, as are C++, java, fortran, javascript, etc.

Specific features of Python

  • an interpreted (as opposed to compiled) language. Contrary to e.g. C++ or fortran, one does not compile Python code before executing it.

  • Used as a scripting language, by python python script.py in a terminal

  • But can be used also interactively: the jupyter notebook, iPython, etc.

  • A free software released under an open-source license: Python can be used and distributed free of charge, even for building commercial software.

  • multi-platform: Python is available for all major operating systems, Windows, Linux/Unix, MacOS X, most likely your mobile phone OS, etc.

  • A very readable language with clear non-verbose syntax

  • A language for which a large amount of high-quality packages are available for various applications, including web-frameworks and scientific computing

  • It has been one of the top languages for data science and machine learning for several years, because it is expressive and and easy to deploy

  • An object-oriented language

See https://www.python.org/about/ for more information about distinguishing features of Python.

Python 2 or Python 3?
  • Simple answer: don’t use Python 2, use Python 3
  • Python 2 is mostly deprecated and has not been maintained for years
  • You’ll end up hanged if you use Python 2
  • If Python 2 is mandatory at your workplace, find another work
Jupyter or Quarto notebooks?
  • quarto is more git friendly than jupyter

  • Enjoy authentic editors

  • Go for quarto

Hello world

  • In a jupyter/quarto notebook, you have an interactive interpreter.

  • You type in the cells, execute commands

Code
print("Hi everybody!")
Hi everybody!

Basic types

Integers

Code
1 + 42
43
Code
type(1+1)
int

We can assign values to variables with =

Code
a = (3 + 5 ** 2) % 4
a
0

Remark

We don’t declare the type of a variable before assigning its value. In C, conversely, one should write

int a = 4;

Something cool

  • Arbitrary large integer arithmetics
Code
17 ** 542
8004153099680695240677662228684856314409365427758266999205063931175132640587226837141154215226851187899067565063096026317140186260836873939218139105634817684999348008544433671366043519135008200013865245747791955240844192282274023825424476387832943666754140847806277355805648624376507618604963106833797989037967001806494232055319953368448928268857747779203073913941756270620192860844700087001827697624308861431399538404552468712313829522630577767817531374612262253499813723569981496051353450351968993644643291035336065584116155321928452618573467361004489993801594806505273806498684433633838323916674207622468268867047187858269410016150838175127772100983052010703525089

Floats

There exists a floating point type that is created when the variable has decimal values

Code
c = 2.
Code
type(c)
float
Code
c = 2
type(c)
int
Code
truc = 1 / 2
truc
0.5
Code
1 // 2 + 1 % 2
1
Code
type(truc)
float

Boolean

Similarly, boolean types are created from a comparison

Code
test = 3 > 4
test
False
Code
type(test)
bool
Code
False == (not True)
True
Code
1.41 < 2.71 and 2.71 < 3.14
True
Code
# It's equivalent to
1.41 < 2.71 < 3.14
True

Type conversion (casting)

Code
a = 1
type(a)
int
Code
b = float(a)
type(b)
float
Code
str(b)
'1.0'
Code
bool(b)
# All non-zero, non empty objects are casted to boolean as True (more later)
True
Code
bool(1-1)
False

Containers

Python provides many efficient types of containers or sequences, in which collections of objects can be stored.

The main ones are list, tuple, set and dict (but there are many others…)

Tuples

Code
tt = 'truc', 3.14, "truc"
tt
('truc', 3.14, 'truc')
Code
tt[0]
'truc'

You can’t change a tuple, we say that it’s immutable

Code
try:
    tt[0] = 1
except TypeError:
    print(f"TypeError: 'tuple' object does not support item assignment")
TypeError: 'tuple' object does not support item assignment

Three ways of doing the same thing

Code
# Method 1
tuple([1, 2, 3])
(1, 2, 3)
Code
# Method 2
1, 2, 3
(1, 2, 3)
Code
# Method 3
(1, 2, 3)
(1, 2, 3)

Simpler is better in Python, so usually you want to use Method 2.

Code
toto = 1, 2, 3
toto
(1, 2, 3)
  • This is serious !

The Zen of Python easter’s egg

Code
import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Lists

A list is an ordered collection of objects. These objects may have different types. For example:

Code
colors = ['red', 'blue', 'green', 'black', 'white']
Code
colors[0]
'red'
Code
type(colors)
list

Indexing: accessing individual objects contained in the list by their position

Code
colors[2]
'green'
Code
colors[2] = 3.14
colors
['red', 'blue', 3.14, 'black', 'white']
Warning

For any iterable object in Python, indexing starts at 0 (as in C), not at 1 (as in Fortran, R, or Matlab).

Counting from the end with negative indices:

Code
colors[-1]
'white'

Index must remain in the range of the list

Code
try:
    colors[10]
except IndexError:
    print(f"IndexError: 10 >= {len(colors)} ==len(colors), index out of range ")
Code
colors
['red', 'blue', 3.14, 'black', 'white']
Code
tt
('truc', 3.14, 'truc')
Code
colors.append(tt)
colors
['red', 'blue', 3.14, 'black', 'white', ('truc', 3.14, 'truc')]
Code
len(colors)
6
Code
len(tt)
3

Slicing: obtaining sublists of regularly-spaced elements

This work with anything iterable whenever it makes sense (list, str, tuple, etc.)

Code
colors
['red', 'blue', 3.14, 'black', 'white', ('truc', 3.14, 'truc')]
Code
list(reversed(colors))
[('truc', 3.14, 'truc'), 'white', 'black', 3.14, 'blue', 'red']
Code
colors[::-1]
[('truc', 3.14, 'truc'), 'white', 'black', 3.14, 'blue', 'red']
Slicing syntax:

colors[start:stop:stride]

start, stop, stride are optional, with default values 0, len(sequence), 1

l

Code
print(slice(4))
print(slice(1,5))
print(slice(None,13,3))
slice(None, 4, None)
slice(1, 5, None)
slice(None, 13, 3)
Code
sl = slice(1,5,2)
colors[sl]
['blue', 'black']
Code
colors
['red', 'blue', 3.14, 'black', 'white', ('truc', 3.14, 'truc')]
Code
colors[3:]
['black', 'white', ('truc', 3.14, 'truc')]
Code
colors[:3]
['red', 'blue', 3.14]
Code
colors[1::2]
['blue', 'black', ('truc', 3.14, 'truc')]
Code
colors[::-1]
[('truc', 3.14, 'truc'), 'white', 'black', 3.14, 'blue', 'red']

Strings

Different string syntaxes (simple, double or triple quotes):

Code
s = 'tintin'
type(s)
str
Code
s
'tintin'
Code
s = """         Bonjour,
Je m'appelle Stephane.
Je vous souhaite une bonne journée.
Salut.       
"""
s
"         Bonjour,\nJe m'appelle Stephane.\nJe vous souhaite une bonne journée.\nSalut.       \n"
Code
s.strip()
"Bonjour,\nJe m'appelle Stephane.\nJe vous souhaite une bonne journée.\nSalut."
Code
print(s.strip())
Bonjour,
Je m'appelle Stephane.
Je vous souhaite une bonne journée.
Salut.
Code
len(s)
91
Code
# Casting to a list
list(s.strip()[:15])
['B', 'o', 'n', 'j', 'o', 'u', 'r', ',', '\n', 'J', 'e', ' ', 'm', "'", 'a']
Code
# Arithmetics
print('Bonjour' * 2)
print('Hello' + ' all')
BonjourBonjour
Hello all
Code
sss = 'A'
sss += 'bc'
sss += 'dE'
sss.lower()
'abcde'
Code
ss = s.strip()
print(ss[:10] + ss[24:28])
Bonjour,
Jepha
Code
s.strip()
"Bonjour,\nJe m'appelle Stephane.\nJe vous souhaite une bonne journée.\nSalut."
Code
s.strip().split('\n')
['Bonjour,',
 "Je m'appelle Stephane.",
 'Je vous souhaite une bonne journée.',
 'Salut.']
Code
s[::3]
'   BjrJmpl ea.eo ui eoeon.at  \n'
Code
s[3:10]
'      B'
Code
" ".join(['Il', 'fait', 'super', 'beau', "aujourd'hui"])
"Il fait super beau aujourd'hui"

Chaining method calls is the basic of pipeline building.

Code
( 
    " ".join(['Il', 'fait', 'super', 'beau', "aujourd'hui"])
       .title()
       .replace(' ', '')
       .replace("'","")
)
'IlFaitSuperBeauAujourdHui'

Important

A string is immutable !!

Code
s = 'I am an immutable guy'
Code
try:  
    s[2] = 's'
except TypeError:
    print(f"Strings are immutable! s is still '{s}'")
Strings are immutable! s is still 'I am an immutable guy'
Code
id(s)
134288353205360
Code
print(s + ', for sure')
id(s), id(s + ' for sure')
I am an immutable guy, for sure
(134288353205360, 134288353731856)

Extra stuff with strings

Code
'square of 2 is ' + str(2 ** 2)
'square of 2 is 4'
Code
'square of 2 is %d' % 2 ** 2
'square of 2 is 4'
Code
'square of 2 is {}'.format(2 ** 2)
'square of 2 is 4'
Code
'square of 2 is {square}'.format(square=2 ** 2)
'square of 2 is 4'
Code
# And since Python 3.6 you can use an `f-string`
number = 2
square = number ** 2

f'square of {number} is {square}'
'square of 2 is 4'

The in keyword

You can use the in keyword with any container, whenever it makes sense

Code
print(s)
print('Salut' in s)
I am an immutable guy
False
Code
print(tt)
print('truc' in tt)
('truc', 3.14, 'truc')
True
Code
print(colors)
print('truc' in colors)
['red', 'blue', 3.14, 'black', 'white', ('truc', 3.14, 'truc')]
False
Code
('truc', 3.14, 'truc') in colors
True
Warning

Strings are not bytes. Have a look at chapter 4 Unicode Text versus Bytes in Fluent Python

Brain-teasing

Explain this weird behaviour:

Code
5 in [1, 2, 3, 4] == False
False
Code
[1, 2, 3, 4] == False
False
Code
5 not in [1, 2, 3, 4]
True
Code
(5 in [1, 2, 3, 4]) == False
True
Code
# ANSWER.
# This is a chained comparison. We have seen that 
1 < 2 < 3
# is equivalent to
(1 < 2) and (2 < 3)
# so that
5 in [1, 2, 3, 4] == False
# is equivalent to
(5 in [1, 2, 3, 4]) and ([1, 2, 3, 4] == False)
False
Code
(5 in [1, 2, 3, 4])
False
Code
([1, 2, 3, 4] == False)
False

Dictionaries

  • A dictionary is basically an efficient table that maps keys to values.
  • The MOST important container in Python.
  • Many things are actually a dict under the hood in Python
Code
tel = {'emmanuelle': 5752, 'sebastian': 5578}
print(tel)
print(type(tel))
{'emmanuelle': 5752, 'sebastian': 5578}
<class 'dict'>
Code
tel['emmanuelle'], tel['sebastian']
(5752, 5578)
Code
tel['francis'] = '5919'
tel
{'emmanuelle': 5752, 'sebastian': 5578, 'francis': '5919'}
Code
len(tel)
3

Important remarks

  • Keys can be of different types
  • A key must be of immutable type
Code
tel[7162453] = [1, 3, 2]
tel[3.14] = 'bidule'
tel[('jaouad', 2)] = 1234
tel
{'emmanuelle': 5752,
 'sebastian': 5578,
 'francis': '5919',
 7162453: [1, 3, 2],
 3.14: 'bidule',
 ('jaouad', 2): 1234}
Code
try:
    sorted(tel)
except TypeError:
    print("TypeError: '<' not supported between instances of 'int' and 'str'")    
TypeError: '<' not supported between instances of 'int' and 'str'
Code
# A list is mutable and not hashable
try:
    tel[['jaouad']] = '5678'
except TypeError:
    print("TypeError: unhashable type: 'list'")
TypeError: unhashable type: 'list'
Code
try:
    tel[2]
except KeyError:
    print("KeyError: 2")
KeyError: 2
Code
tel = {'emmanuelle': 5752, 'sebastian' : 5578, 'jaouad' : 1234}
print(tel.keys())
print(tel.values())
print(tel.items())
dict_keys(['emmanuelle', 'sebastian', 'jaouad'])
dict_values([5752, 5578, 1234])
dict_items([('emmanuelle', 5752), ('sebastian', 5578), ('jaouad', 1234)])
Code
list(tel.keys())[2]
'jaouad'
Code
tel.values().mapping
mappingproxy({'emmanuelle': 5752, 'sebastian': 5578, 'jaouad': 1234})
Code
type(tel.keys())
dict_keys
Code
'rémi' in tel
False
Code
list(tel)
['emmanuelle', 'sebastian', 'jaouad']
Code
'rémi' in tel.keys()
False

You can swap values like this

Code
print(tel)
tel['emmanuelle'], tel['sebastian'] = tel['sebastian'], tel['emmanuelle']
print(tel)
{'emmanuelle': 5752, 'sebastian': 5578, 'jaouad': 1234}
{'emmanuelle': 5578, 'sebastian': 5752, 'jaouad': 1234}
Code
# It works, since
a, b = 2.71, 3.14
a, b = b, a
a, b
(3.14, 2.71)

Exercise 1

Get keys of tel sorted by decreasing order

Code
tel = {'emmanuelle': 5752, 'sebastian' : 5578, 'jaouad' : 1234}

Exercise 2

Get keys of tel sorted by increasing values

Code
tel = {'emmanuelle': 5752, 'sebastian' : 5578, 'jaouad' : 1234}

Exercise 3

Obtain a sorted-by-key version of tel

Code
tel = {'emmanuelle': 5752, 'sebastian' : 5578, 'jaouad' : 1234}

Sets

A set is an unordered container, containing unique elements

Code
ss = {1, 2, 2, 2, 3, 3, 'tintin', 'tintin', 'toto'}
ss
{1, 2, 3, 'tintin', 'toto'}
Code
s = 'truc truc bidule truc'
set(s)
{' ', 'b', 'c', 'd', 'e', 'i', 'l', 'r', 't', 'u'}
Code
set(list(s))
{' ', 'b', 'c', 'd', 'e', 'i', 'l', 'r', 't', 'u'}
Code
{1, 5, 2, 1, 1}.union({1, 2, 3})
{1, 2, 3, 5}
Code
set((1, 5, 3, 2))
{1, 2, 3, 5}
Code
set([1, 5, 2, 1, 1]).intersection(set([1, 2, 3]))
{1, 2}
Code
ss.add('tintin')
ss
{1, 2, 3, 'tintin', 'toto'}
Code
ss.difference(range(6))
{'tintin', 'toto'}

You can combine all containers together

Code
dd = {
    'truc': [1, 2, 3], 
    5: (1, 4, 2),
    (1, 3): {'hello', 'world'}
}
dd
{'truc': [1, 2, 3], 5: (1, 4, 2), (1, 3): {'hello', 'world'}}

Assigments in Python is name binding

Everything is either mutable or immutable

Code
ss = {1, 2, 3}
sss = ss
sss, ss
({1, 2, 3}, {1, 2, 3})
Code
id(ss), id(sss)
(134286904486688, 134286904486688)
Code
sss.add("Truc")

Question. What is in ss ?

Code
ss, sss
({1, 2, 3, 'Truc'}, {1, 2, 3, 'Truc'})

ss and sss are names for the same object

Code
id(ss), id(sss)
(134286904486688, 134286904486688)
Code
ss is sss
True
Code
help('is')
Comparisons
***********

Unlike C, all comparison operations in Python have the same priority,
which is lower than that of any arithmetic, shifting or bitwise
operation.  Also unlike C, expressions like "a < b < c" have the
interpretation that is conventional in mathematics:

   comparison    ::= or_expr (comp_operator or_expr)*
   comp_operator ::= "<" | ">" | "==" | ">=" | "<=" | "!="
                     | "is" ["not"] | ["not"] "in"

Comparisons yield boolean values: "True" or "False". Custom *rich
comparison methods* may return non-boolean values. In this case Python
will call "bool()" on such value in boolean contexts.

Comparisons can be chained arbitrarily, e.g., "x < y <= z" is
equivalent to "x < y and y <= z", except that "y" is evaluated only
once (but in both cases "z" is not evaluated at all when "x < y" is
found to be false).

Formally, if *a*, *b*, *c*, …, *y*, *z* are expressions and *op1*,
*op2*, …, *opN* are comparison operators, then "a op1 b op2 c ... y
opN z" is equivalent to "a op1 b and b op2 c and ... y opN z", except
that each expression is evaluated at most once.

Note that "a op1 b op2 c" doesn’t imply any kind of comparison between
*a* and *c*, so that, e.g., "x < y > z" is perfectly legal (though
perhaps not pretty).


Value comparisons
=================

The operators "<", ">", "==", ">=", "<=", and "!=" compare the values
of two objects.  The objects do not need to have the same type.

Chapter Objects, values and types states that objects have a value (in
addition to type and identity).  The value of an object is a rather
abstract notion in Python: For example, there is no canonical access
method for an object’s value.  Also, there is no requirement that the
value of an object should be constructed in a particular way, e.g.
comprised of all its data attributes. Comparison operators implement a
particular notion of what the value of an object is.  One can think of
them as defining the value of an object indirectly, by means of their
comparison implementation.

Because all types are (direct or indirect) subtypes of "object", they
inherit the default comparison behavior from "object".  Types can
customize their comparison behavior by implementing *rich comparison
methods* like "__lt__()", described in Basic customization.

The default behavior for equality comparison ("==" and "!=") is based
on the identity of the objects.  Hence, equality comparison of
instances with the same identity results in equality, and equality
comparison of instances with different identities results in
inequality.  A motivation for this default behavior is the desire that
all objects should be reflexive (i.e. "x is y" implies "x == y").

A default order comparison ("<", ">", "<=", and ">=") is not provided;
an attempt raises "TypeError".  A motivation for this default behavior
is the lack of a similar invariant as for equality.

The behavior of the default equality comparison, that instances with
different identities are always unequal, may be in contrast to what
types will need that have a sensible definition of object value and
value-based equality.  Such types will need to customize their
comparison behavior, and in fact, a number of built-in types have done
that.

The following list describes the comparison behavior of the most
important built-in types.

* Numbers of built-in numeric types (Numeric Types — int, float,
  complex) and of the standard library types "fractions.Fraction" and
  "decimal.Decimal" can be compared within and across their types,
  with the restriction that complex numbers do not support order
  comparison.  Within the limits of the types involved, they compare
  mathematically (algorithmically) correct without loss of precision.

  The not-a-number values "float('NaN')" and "decimal.Decimal('NaN')"
  are special.  Any ordered comparison of a number to a not-a-number
  value is false. A counter-intuitive implication is that not-a-number
  values are not equal to themselves.  For example, if "x =
  float('NaN')", "3 < x", "x < 3" and "x == x" are all false, while "x
  != x" is true.  This behavior is compliant with IEEE 754.

* "None" and "NotImplemented" are singletons.  **PEP 8** advises that
  comparisons for singletons should always be done with "is" or "is
  not", never the equality operators.

* Binary sequences (instances of "bytes" or "bytearray") can be
  compared within and across their types.  They compare
  lexicographically using the numeric values of their elements.

* Strings (instances of "str") compare lexicographically using the
  numerical Unicode code points (the result of the built-in function
  "ord()") of their characters. [3]

  Strings and binary sequences cannot be directly compared.

* Sequences (instances of "tuple", "list", or "range") can be compared
  only within each of their types, with the restriction that ranges do
  not support order comparison.  Equality comparison across these
  types results in inequality, and ordering comparison across these
  types raises "TypeError".

  Sequences compare lexicographically using comparison of
  corresponding elements.  The built-in containers typically assume
  identical objects are equal to themselves.  That lets them bypass
  equality tests for identical objects to improve performance and to
  maintain their internal invariants.

  Lexicographical comparison between built-in collections works as
  follows:

  * For two collections to compare equal, they must be of the same
    type, have the same length, and each pair of corresponding
    elements must compare equal (for example, "[1,2] == (1,2)" is
    false because the type is not the same).

  * Collections that support order comparison are ordered the same as
    their first unequal elements (for example, "[1,2,x] <= [1,2,y]"
    has the same value as "x <= y").  If a corresponding element does
    not exist, the shorter collection is ordered first (for example,
    "[1,2] < [1,2,3]" is true).

* Mappings (instances of "dict") compare equal if and only if they
  have equal "(key, value)" pairs. Equality comparison of the keys and
  values enforces reflexivity.

  Order comparisons ("<", ">", "<=", and ">=") raise "TypeError".

* Sets (instances of "set" or "frozenset") can be compared within and
  across their types.

  They define order comparison operators to mean subset and superset
  tests.  Those relations do not define total orderings (for example,
  the two sets "{1,2}" and "{2,3}" are not equal, nor subsets of one
  another, nor supersets of one another).  Accordingly, sets are not
  appropriate arguments for functions which depend on total ordering
  (for example, "min()", "max()", and "sorted()" produce undefined
  results given a list of sets as inputs).

  Comparison of sets enforces reflexivity of its elements.

* Most other built-in types have no comparison methods implemented, so
  they inherit the default comparison behavior.

User-defined classes that customize their comparison behavior should
follow some consistency rules, if possible:

* Equality comparison should be reflexive. In other words, identical
  objects should compare equal:

     "x is y" implies "x == y"

* Comparison should be symmetric. In other words, the following
  expressions should have the same result:

     "x == y" and "y == x"

     "x != y" and "y != x"

     "x < y" and "y > x"

     "x <= y" and "y >= x"

* Comparison should be transitive. The following (non-exhaustive)
  examples illustrate that:

     "x > y and y > z" implies "x > z"

     "x < y and y <= z" implies "x < z"

* Inverse comparison should result in the boolean negation. In other
  words, the following expressions should have the same result:

     "x == y" and "not x != y"

     "x < y" and "not x >= y" (for total ordering)

     "x > y" and "not x <= y" (for total ordering)

  The last two expressions apply to totally ordered collections (e.g.
  to sequences, but not to sets or mappings). See also the
  "total_ordering()" decorator.

* The "hash()" result should be consistent with equality. Objects that
  are equal should either have the same hash value, or be marked as
  unhashable.

Python does not enforce these consistency rules. In fact, the
not-a-number values are an example for not following these rules.


Membership test operations
==========================

The operators "in" and "not in" test for membership.  "x in s"
evaluates to "True" if *x* is a member of *s*, and "False" otherwise.
"x not in s" returns the negation of "x in s".  All built-in sequences
and set types support this as well as dictionary, for which "in" tests
whether the dictionary has a given key. For container types such as
list, tuple, set, frozenset, dict, or collections.deque, the
expression "x in y" is equivalent to "any(x is e or x == e for e in
y)".

For the string and bytes types, "x in y" is "True" if and only if *x*
is a substring of *y*.  An equivalent test is "y.find(x) != -1".
Empty strings are always considered to be a substring of any other
string, so """ in "abc"" will return "True".

For user-defined classes which define the "__contains__()" method, "x
in y" returns "True" if "y.__contains__(x)" returns a true value, and
"False" otherwise.

For user-defined classes which do not define "__contains__()" but do
define "__iter__()", "x in y" is "True" if some value "z", for which
the expression "x is z or x == z" is true, is produced while iterating
over "y". If an exception is raised during the iteration, it is as if
"in" raised that exception.

Lastly, the old-style iteration protocol is tried: if a class defines
"__getitem__()", "x in y" is "True" if and only if there is a non-
negative integer index *i* such that "x is y[i] or x == y[i]", and no
lower integer index raises the "IndexError" exception.  (If any other
exception is raised, it is as if "in" raised that exception).

The operator "not in" is defined to have the inverse truth value of
"in".


Identity comparisons
====================

The operators "is" and "is not" test for an object’s identity: "x is
y" is true if and only if *x* and *y* are the same object.  An
Object’s identity is determined using the "id()" function.  "x is not
y" yields the inverse truth value. [4]

Related help topics: EXPRESSIONS, BASICMETHODS

About assigments

  • Python never copies an object
  • Unless you ask him to

When you code

x = [1, 2, 3]
y = x

you just - bind the variable name x to a list [1, 2, 3] - give another name y to the same object

Important remarks

  • Everything is an object in Python
  • Either immutable or mutable
Code
id(1), id(1+1), id(2)
(11753896, 11753928, 11753928)

A list is mutable

Code
x = [1, 2, 3]
print(id(x), x)
x[0] += 42; x.append(3.14)
print(id(x), x)
134288353648384 [1, 2, 3]
134288353648384 [43, 2, 3, 3.14]

A str is immutable

In order to “change” an immutable object, Python creates a new one

Code
s = 'to'
print(id(s), s)
s += 'to'
print(id(s), s)
134288676817440 to
134288353246496 toto

Once again, a list is mutable

Code
super_list = [3.14, (1, 2, 3), 'tintin']
other_list = super_list
id(other_list), id(super_list)
(134288353864832, 134288353864832)
  • other_list and super_list are the same list
  • If you change one, you change the other.
  • id returns the identity of an object. Two objects with the same idendity are the same (not only the same type, but the same instance)
Code
other_list[1] = 'youps'
other_list, super_list
([3.14, 'youps', 'tintin'], [3.14, 'youps', 'tintin'])
Code
id(super_list), id(other_list)
(134288353864832, 134288353864832)

If you want a copy, to need to ask for one

Code
other_list = super_list.copy()
id(other_list), id(super_list)
(134288353471744, 134288353864832)
Code
other_list[1] = 'copy'
other_list, super_list
([3.14, 'copy', 'tintin'], [3.14, 'youps', 'tintin'])

Only other_list is modified.

But… what if you have a list of list ? (or a mutable object containing mutable objects)

Code
l1, l2 = [1, 2, 3], [4, 5, 6]
list_list = [l1, l2]
list_list
[[1, 2, 3], [4, 5, 6]]
Code
id(list_list), id(list_list[0]), id(l1), list_list[0] is l1
(134288353460288, 134288353211264, 134288353211264, True)

Let’s make a copy of list_list

Code
copy_list = list_list.copy()
copy_list.append('super')
list_list, copy_list
([[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6], 'super'])
Code
id(list_list[0]), id(copy_list[0])
(134288353211264, 134288353211264)

OK, only copy_list is modified, as expected

But now…

Code
copy_list[0][1] = 'oups'
copy_list, list_list
([[1, 'oups', 3], [4, 5, 6], 'super'], [[1, 'oups', 3], [4, 5, 6]])

Question. What happened ?!?

  • The list_list object is copied
  • But NOT what it’s containing !
  • By default copy does a shallow copy, not a deep copy
  • It does not build copies of what is contained
  • If you want to copy an object and all that is contained in it, you need to use deepcopy.
Code
from copy import deepcopy

copy_list = deepcopy(list_list)
copy_list[0][1] = 'incredible !'
list_list, copy_list
([[1, 'oups', 3], [4, 5, 6]], [[1, 'incredible !', 3], [4, 5, 6]])

Final remarks

Code
tt = ([1, 2, 3], [4, 5, 6])
print(id(tt), tt)
print(list(map(id, tt)))
134286902638208 ([1, 2, 3], [4, 5, 6])
[134288353646976, 134288353647872]
Code
tt[0][1] = '42'
print(id(tt), tt)
print(list(map(id, tt)))
134286902638208 ([1, '42', 3], [4, 5, 6])
[134288353646976, 134288353647872]
Code
s = [1, 2, 3]
Code
s2 = s
Code
s2 is s
True
Code
id(s2), id(s)
(134286903736640, 134286903736640)

Control flow and other stuff…

Namely tests, loops, again booleans, etc.

Code
if 2 ** 2 == 5:
    print('Obvious')
else:
    print('YES')
print('toujours')
YES
toujours

Blocks are delimited by indentation!

Code
a = 3
if a > 0:
    if a == 1:
        print(1)
    elif a == 2:
        print(2)
elif a == 2:
    print(2)
elif a == 3:
    print(3)
else:
    print(a)

Anything can be understood as a boolean

For example, don’t do this to test if a list is empty

Code
l2 = ['hello', 'everybody']

if len(l2) > 0:
    print(l2[0])
hello

but this

Code
if l2:
    print(l2[0])
hello

Some poetry

  • An empty dict is False
  • An empty string is False
  • An empty list is False
  • An empty tuple is False
  • An empty set is False
  • 0 is False
  • .0 is False
  • etc…
  • everything else is True

While loops

Code
a = 10
b = 1
while b < a:
    b = b + 1
    print(b)
2
3
4
5
6
7
8
9
10

Compute the decimals of Pi using the Wallis formula

\[ \pi = 2 \prod_{i=1}^{100} \frac{4i^2}{4i^2 - 1} \]

Code
pi = 2
eps = 1e-10
dif = 2 * eps
i = 1
while dif > eps:
    pi, i, old_pi = pi * 4 * i ** 2 / (4 * i ** 2 - 1), i + 1, pi
    dif = pi - old_pi
Code
pi
3.1415837914138556
Code
from math import pi

pi
3.141592653589793

for loop with range

  • Iteration with an index, with a list, with many things !
  • range has the same parameters as with slicing start:end:stride, all parameters being optional
Code
for i in range(10):
    print(i)
0
1
2
3
4
5
6
7
8
9
Code
for i in range(4):
    print(i + 1)
print('-')

for i in range(1, 5):
    print(i)
print('-')

for i in range(1, 10, 3):
    print(i)
1
2
3
4
-
1
2
3
4
-
1
4
7

Something for nerds. You can use else in a for loop

Code
names = ['stephane', 'mokhtar', 'jaouad', 'simon', 'yiyang']

for name in names:
    if name.startswith('u'):
        print(name)
        break
else:
    print('Not found.')
Not found.
Code
names = ['stephane', 'mokhtar', 'jaouad', 'ulysse', 'simon', 'yiyang']

for name in names:
    if name.startswith('u'):
        print(name)
        break
else:
    print('Not found.')
ulysse

For loops over iterable objects

You can iterate using for over any container: list, tuple, dict, str, set among others…

Code
colors = ['red', 'blue', 'black', 'white']
peoples = ['stephane', 'jaouad', 'mokhtar', 'yiyang', 'rémi']
Code
# This is stupid
for i in range(len(colors)):
    print(colors[i])
    
# This is better
for color in colors:
    print(color)
red
blue
black
white
red
blue
black
white

To iterate over several sequences at the same time, use zip

Code
for color, people in zip(colors, peoples):
    print(color, people)
red stephane
blue jaouad
black mokhtar
white yiyang
Code
l = ["Bonjour", {'francis': 5214, 'stephane': 5123}, ('truc', 3)]
for e in l:
    print(e, len(e))
Bonjour 7
{'francis': 5214, 'stephane': 5123} 2
('truc', 3) 2

Loop over a str

Code
s = 'Bonjour'
for c in s:
    print(c)
B
o
n
j
o
u
r

Loop over a dict

Code
dd = {(1, 3): {'hello', 'world'}, 'truc': [1, 2, 3], 5: (1, 4, 2)}

# Default is to loop over keys
for key in dd:
    print(key)
(1, 3)
truc
5
Code
# Loop over values
for e in dd.values():
    print(e)
{'hello', 'world'}
[1, 2, 3]
(1, 4, 2)
Code
# Loop over items (key, value) pairs
for key, val in dd.items():
    print(key, val)
(1, 3) {'hello', 'world'}
truc [1, 2, 3]
5 (1, 4, 2)
Code
for t in dd.items():
    print(t)
((1, 3), {'hello', 'world'})
('truc', [1, 2, 3])
(5, (1, 4, 2))

Comprehensions

You can construct a list, dict, set and others using the comprehension syntax

list comprehension

Code
print(colors)
print(peoples)
['red', 'blue', 'black', 'white']
['stephane', 'jaouad', 'mokhtar', 'yiyang', 'rémi']
Code
l = []
for p, c in zip(peoples, colors):
    if len(c)<=4 :
        l.append(p)
print(l)
['stephane', 'jaouad']
Code
# The list of people with favorite color that has no more than 4 characters

[people for color, people in zip(colors, peoples) if len(color) <= 4]
['stephane', 'jaouad']

dict comprehension

Code
{people: color for color, people in zip(colors, peoples) if len(color) <= 4}
{'stephane': 'red', 'jaouad': 'blue'}
Code
# Allows to build a dict from two lists (for keys and values)
{key: value for (key, value) in zip(peoples, colors)}
{'stephane': 'red', 'jaouad': 'blue', 'mokhtar': 'black', 'yiyang': 'white'}
Code
# But it's simpler (so better) to use
dict(zip(peoples, colors))
{'stephane': 'red', 'jaouad': 'blue', 'mokhtar': 'black', 'yiyang': 'white'}

Something very convenient is enumerate

Code
for i, color in enumerate(colors):
    print(i, color)
0 red
1 blue
2 black
3 white
Code
list(enumerate(colors))
[(0, 'red'), (1, 'blue'), (2, 'black'), (3, 'white')]
Code
dict(enumerate(s))
{0: 'B', 1: 'o', 2: 'n', 3: 'j', 4: 'o', 5: 'u', 6: 'r'}
Code
print(dict(enumerate(s)))
{0: 'B', 1: 'o', 2: 'n', 3: 'j', 4: 'o', 5: 'u', 6: 'r'}
Code
s = 'Hey everyone'
{c: i for i, c in enumerate(s)}
{'H': 0, 'e': 11, 'y': 8, ' ': 3, 'v': 5, 'r': 7, 'o': 9, 'n': 10}

About functional programming

We can use lambda to define anonymous functions, and use them in the map and reduce functions

Code
square = lambda x: x ** 2
square(2)
4
Code
type(square)
function
Code
dir(square)
['__annotations__',
 '__builtins__',
 '__call__',
 '__class__',
 '__closure__',
 '__code__',
 '__defaults__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__get__',
 '__getattribute__',
 '__getstate__',
 '__globals__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__kwdefaults__',
 '__le__',
 '__lt__',
 '__module__',
 '__name__',
 '__ne__',
 '__new__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__type_params__']
Code
s = "a"
Code
try:
    square("a")
except TypeError:
    print("TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'")
TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'
Code
sum2 = lambda a, b: a + b
print(sum2('Hello', ' world'))
print(sum2(1, 2))
Hello world
3

Intended for short and one-line function.

More complex functions use def (see below)

Exercise

Print the squares of even numbers between 0 et 15

  1. Using a list comprehension as before
  2. Using map

Brain-teasing

What is the output of

Code
reduce(lambda a, b: a + b[0] * b[1], enumerate('abcde'), 'A')

Generators

Code
import sys
import matplotlib.pyplot as plt
%matplotlib inline
Code
plt.figure(figsize=(6, 6))
plt.plot([sys.getsizeof(list(range(i))) for i in range(10000)], lw=3)
plt.plot([sys.getsizeof(range(i)) for i in range(10000)], lw=3)
plt.xlabel('Number of elements (value of i)', fontsize=14)
plt.ylabel('Size (in bytes)', fontsize=14)
_ = plt.legend(['list(range(i))', 'range(i)'], fontsize=16)

Why generators ?

The memory used by range(i) does not scale linearly with i

What is happening ?

  • range(n) does not allocate a list of n elements !
  • It generates on the fly the list of required integers
  • We say that such an object behaves like a generator in Python
  • Many things in the Python standard library behaves like this

Warning. Getting the real memory footprint of a Python object is difficult. Note that sizeof calls the __sizeof__ method of r, which does not give in general the actual memory used by an object. But nevermind here.

The following computation has no memory footprint:

Code
sum(range(10**8))
4999999950000000
Code
map(lambda x: x**2, range(10**7))
<map at 0x7a221b247bb0>

map does not return a list for the same reason

Code
sum(map(lambda x: x**2, range(10**6)))
333332833333500000

Generator expression

Namely generators defined through comprehensions. Just replace [] by () in the comprehension.

A generator can be iterated on only once

Code
range(10)
range(0, 10)
Code
carres = (i**2 for i in range(10))
Code
carres
<generator object <genexpr> at 0x7a2271963510>
Code
for c in carres:
    print(c)
0
1
4
9
16
25
36
49
64
81
Code
for i in range(4):
    for j in range(3):
        print(i, j)
0 0
0 1
0 2
1 0
1 1
1 2
2 0
2 1
2 2
3 0
3 1
3 2
Code
from itertools import product

for t in product(range(4), range(3)):
    print(t)
(0, 0)
(0, 1)
(0, 2)
(1, 0)
(1, 1)
(1, 2)
(2, 0)
(2, 1)
(2, 2)
(3, 0)
(3, 1)
(3, 2)
Code
from itertools import product

gene = (i + j for i, j in product(range(3), range(3)))
gene
<generator object <genexpr> at 0x7a2271963e00>
Code
print(list(gene))
print(list(gene))
[0, 1, 2, 1, 2, 3, 2, 3, 4]
[]

yield

Something very powerful

Code
def startswith(words, letter):
    for word in words:
        if word.startswith(letter):
            yield word
Code
words = [
    'Python', "is", 'awesome', 'in', 'particular', 'generators', 
    'are', 'really', 'cool'
]
Code
list(word for word in words if word.startswith("a"))
['awesome', 'are']
Code
a = 2
Code
float(a)
2.0

But also with a for loop

Code
for word in startswith(words, letter='a'):
    print(word)
awesome
are
Code
it = startswith(words, letter='a')
Code
type(it)
generator
Code
next(it)
'awesome'
Code
next(it)
'are'
Code
try:
    next(it)
except StopIteration:
    print("StopIteration exception!")
StopIteration exception!

A glimpse at the collections module

(This is where the good stuff hides)

Code
texte = """             
Bonjour,
Python c'est super.
Python ca a l'air quand même un peu compliqué.
Mais bon, ca a l'air pratique.
Peut-être que je pourrais m'en servir pour faire des trucs super.
"""
texte
"             \nBonjour,\nPython c'est super.\nPython ca a l'air quand même un peu compliqué.\nMais bon, ca a l'air pratique.\nPeut-être que je pourrais m'en servir pour faire des trucs super.\n"
Code
print(texte)
             
Bonjour,
Python c'est super.
Python ca a l'air quand même un peu compliqué.
Mais bon, ca a l'air pratique.
Peut-être que je pourrais m'en servir pour faire des trucs super.
Code
# Some basic text preprocessing 
new_text = (
    texte
    .strip()
    .replace('\n', ' ')
    .replace(',', ' ')
    .replace('.', ' ')
    .replace("'", ' ')
)

print(new_text)
print('-' * 8)

words = new_text.split()
print(words)
Bonjour  Python c est super  Python ca a l air quand même un peu compliqué  Mais bon  ca a l air pratique  Peut-être que je pourrais m en servir pour faire des trucs super 
--------
['Bonjour', 'Python', 'c', 'est', 'super', 'Python', 'ca', 'a', 'l', 'air', 'quand', 'même', 'un', 'peu', 'compliqué', 'Mais', 'bon', 'ca', 'a', 'l', 'air', 'pratique', 'Peut-être', 'que', 'je', 'pourrais', 'm', 'en', 'servir', 'pour', 'faire', 'des', 'trucs', 'super']

Exercise

Count the number of occurences of all the words in words.

Output must be a dictionary containg word: count

Code
print(words)
['Bonjour', 'Python', 'c', 'est', 'super', 'Python', 'ca', 'a', 'l', 'air', 'quand', 'même', 'un', 'peu', 'compliqué', 'Mais', 'bon', 'ca', 'a', 'l', 'air', 'pratique', 'Peut-être', 'que', 'je', 'pourrais', 'm', 'en', 'servir', 'pour', 'faire', 'des', 'trucs', 'super']

Exercise

Compute the number of occurences AND the length of each word in words.

Output must be a dictionary containing word: (count, length)

I/O, reading and writing files

Next, put a text file miserables.txt in the folder containing this notebook. If it is not there, the next cell downloads it, if is it there, then we do nothing.

Code
import requests
import os

# The path containing your notebook
path_data = './'
# The name of the file
filename = 'miserables.txt'

if os.path.exists(os.path.join(path_data, filename)):
    print('The file %s already exists.' % os.path.join(path_data, filename))
else:
    url = 'https://stephanegaiffas.github.io/big_data_course/data/miserables.txt'
    r = requests.get(url)
    with open(os.path.join(path_data, filename), 'wb') as f:
        f.write(r.content)
    print('Downloaded file %s.' % os.path.join(path_data, filename))
Downloaded file ./miserables.txt.
Code
ls -alh
total 668K
drwxrwxr-x 10 boucheron boucheron 4,0K avril  3 15:07 ./
drwxrwxr-x  6 boucheron boucheron 4,0K avril  3 14:55 ../
drwxr-xr-x  3 boucheron boucheron 4,0K avril  3 15:02 0c19d4a9-62d0-4073-9add-d08089e30b7a/
-rw-rw-r--  1 boucheron boucheron  68K avril  3 15:07 checking_parquet_citibike.html
-rw-rw-r--  1 boucheron boucheron 3,7K avril  3 14:55 checking_parquet_citibike.qmd
drwxr-xr-x  2 boucheron boucheron 4,0K avril  3 15:02 csr.parquet/
drwxrwxr-x  3 boucheron boucheron 4,0K avril  3 15:07 .jupyter_cache/
drwxrwxr-x  3 boucheron boucheron 4,0K avril  3 15:01 __MACOSX/
-rw-rw-r--  1 boucheron boucheron  128 avril  3 14:55 _metadata.yml
-rw-rw-r--  1 boucheron boucheron 9,0K avril  3 15:07 miserables.txt
drwxrwxr-x  3 boucheron boucheron 4,0K avril  3 15:02 notebook01_python_files/
-rw-rw-r--  1 boucheron boucheron  71K avril  3 14:55 notebook01_python.qmd
-rw-rw-r--  1 boucheron boucheron 164K avril  3 15:07 notebook01_python.quarto_ipynb
drwxrwxr-x  3 boucheron boucheron 4,0K avril  3 15:02 notebook02_numpy_files/
-rw-rw-r--  1 boucheron boucheron  29K avril  3 14:55 notebook02_numpy.qmd
-rw-rw-r--  1 boucheron boucheron  22K avril  3 14:55 notebook03_pandas.qmd
-rw-rw-r--  1 boucheron boucheron  19K avril  3 14:55 notebook04_pandas_spark.qmd
-rw-rw-r--  1 boucheron boucheron 9,3K avril  3 14:55 notebook05_sparkrdd.qmd
-rw-rw-r--  1 boucheron boucheron  23K avril  3 14:55 notebook06_sparksql.qmd
-rw-rw-r--  1 boucheron boucheron  29K avril  3 14:55 notebook07_json-format.qmd
drwxrwxr-x  3 boucheron boucheron 4,0K avril  3 15:02 notebook08_webdata-II_files/
-rw-rw-r--  1 boucheron boucheron  25K avril  3 14:55 notebook08_webdata-II.qmd
-rw-rw-r--  1 boucheron boucheron  30K avril  3 14:55 notebook08_webdata.qmd
-rw-rw-r--  1 boucheron boucheron  36K avril  3 15:07 notebook-0.html
-rw-rw-r--  1 boucheron boucheron  153 avril  3 14:55 notebook-0.qmd
-rw-rw-r--  1 boucheron boucheron  755 avril  3 14:55 notebook10_graphx.qmd
-rw-rw-r--  1 boucheron boucheron  19K avril  3 14:55 notebook11_dive.qmd
-rw-rw-r--  1 boucheron boucheron 7,1K avril  3 14:55 notebook14.qmd
-rw-rw-r--  1 boucheron boucheron 2,0K avril  3 14:55 notebookxx_pg_pandas_spark.qmd
drwxrwxr-x  2 boucheron boucheron 4,0K avril  3 15:01 webdata.parquet/
-rw-rw-r--  1 boucheron boucheron 4,7K avril  3 14:55 xcitibike_spark.qmd
-rw-rw-r--  1 boucheron boucheron  15K avril  3 14:55 xciti_pandas.qmd
Code
# !rm -f miserables.txt
Code
os.path.join(path_data, filename)
'./miserables.txt'

In jupyter and ipython you can run terminal command lines using !

Let’s count number of lines and number of words with the wc command-line tool (linux or mac only, don’t ask me how on windows)

Code
# Lines count
!wc -l miserables.txt
79 miserables.txt
Code
# Word count
!wc -w miserables.txt
277 miserables.txt

Exercise

Count the number of occurences of each word in the text file miserables.txt. We use a open context and the Counter from before.

Contexts

  • A context in Python is something that we use with the with keyword.

  • It allows to deal automatically with the opening and the closing of the file.

Note the for loop:

for line in f:
    ...

You loop directly over the lines of the open file from within the open context

About pickle

You can save your computation with pickle.

  • pickle is a way of saving almost anything with Python.
  • It serializes the object in a binary format, and is usually the simplest and fastest way to go.
Code
import pickle as pkl

# Let's save it
with open('miserable_word_counts.pkl', 'wb') as f:
    pkl.dump(counter, f)

# And read it again
with open('miserable_word_counts.pkl', 'rb') as f:
    counter = pkl.load(f)
Code
counter.most_common(10)
[('{', 15),
 ('}', 15),
 ('0', 8),
 ('img', 6),
 ('margin:', 6),
 ('font', 6),
 ('logo', 6),
 ('only', 6),
 ('screen', 6),
 ('and', 6)]

Defining functions

You must use function to order and reuse code

Function definition

Function blocks must be indented as other control-flow blocks.

Code
def test():
    return 'in test function'

test()
'in test function'

Return statement

Functions can optionally return values. By default, functions return None.

The syntax to define a function:

  • the def keyword;
  • is followed by the function’s name, then
  • the arguments of the function are given between parentheses followed by a colon
  • the function body;
  • and return object for optionally returning values.
Code
None is None
True
Code
def f(x):
    return x + 10
f(20)
30

A function that returns several elements returns a tuple

Code
def f(x):
    return x + 1, x + 4

f(5)
(6, 9)
Code
type(f)
function
Code
f.truc = "bonjour"
Code
type(f(5))
tuple

Parameters

Mandatory parameters (positional arguments)

Code
def double_it(x):
    return x * 2

double_it(2)
4
Code
try:
    double_it()
except TypeError:
    print("TypeError: double_it() missing 1 required positional argument: 'x'")
TypeError: double_it() missing 1 required positional argument: 'x'

Optimal parameters

Code
def double_it(x=2):
    return x * 2

double_it()
4
Code
double_it(3)
6
Code
def f(x, y=2, z=10):
    print(x, '+', y, '+', z, '=', x + y + z)
Code
f(5)
5 + 2 + 10 = 17
Code
f(5, -2)
5 + -2 + 10 = 13
Code
f(5, -2, 8)
5 + -2 + 8 = 11
Code
f(z=5, x=-2, y=8)
-2 + 8 + 5 = 11

Argument unpacking and keyword argument unpacking

You can do stuff like this, using unpacking * notation

Code
a, *b, c = 1, 2, 3, 4, 5
a, b, c
(1, [2, 3, 4], 5)

Back to function f you can unpack a tuple as positional arguments

Code
tt = (1, 2, 3)
f(*tt)
1 + 2 + 3 = 6
Code
dd = {'y': 10, 'z': -5}
Code
f(3, **dd)
3 + 10 + -5 = 8
Code
def g(x, z, y, t=1, u=2):
    print(x, '+', y, '+', z, '+', t, '+', 
          u, '=', x + y + z + t + u)
Code
tt = (1, -4, 2)
dd = {'t': 10, 'u': -5}
g(*tt, **dd)
1 + 2 + -4 + 10 + -5 = 4

The prototype of all functions in Python

Code
def f(*args, **kwargs):
    print('args=', args)
    print('kwargs=', kwargs)

f(1, 2, 'truc', lastname='gaiffas', firstname='stephane')
args= (1, 2, 'truc')
kwargs= {'lastname': 'gaiffas', 'firstname': 'stephane'}
  • Uses * for argument unpacking and ** for keyword argument unpacking
  • The names args and kwargs are a convention, not mandatory
  • (but you are fired if you name these arguments otherwise)
Code
# How to get fired
def f(*aaa, **bbb):
    print('args=', aaa)
    print('kwargs=', bbb)
f(1, 2, 'truc', lastname='gaiffas', firstname='stephane')    
args= (1, 2, 'truc')
kwargs= {'lastname': 'gaiffas', 'firstname': 'stephane'}

Remark. A function is a regular an object… you can add attributes on it !

Code
f.truc = 4
Code
f(1, 3)
args= (1, 3)
kwargs= {}
Code
f(3, -2, y='truc')
args= (3, -2)
kwargs= {'y': 'truc'}

Object-oriented programming (OOP)

Python supports object-oriented programming (OOP). The goals of OOP are:

  • to organize the code, and
  • to re-use code in similar contexts.

Here is a small example: we create a Student class, which is an object gathering several custom functions (called methods) and variables (called attributes).

Code
class Student(object):

    def __init__(self, name, birthyear, major='computer science'):
        self.name = name
        self.birthyear = birthyear
        self.major = major

    def __repr__(self):
        return "Student(name='{name}', birthyear={birthyear}, major='{major}')"\
                .format(name=self.name, birthyear=self.birthyear, major=self.major)

anna = Student('anna', 1987)
anna
Student(name='anna', birthyear=1987, major='computer science')

The __repr__ is what we call a ‘magic method’ in Python, that allows to display an object as a string easily. There is a very large number of such magic methods. There are used to implement interfaces

Exercise

Add a age method to the Student class that computes the age of the student. - You can (and should) use the datetime module. - Since we only know about the birth year, let’s assume that the day of the birth is January, 1st.

Properties

We can make methods look like attributes using properties, as shown below

Code
class Student(object):

    def __init__(self, name, birthyear, major='computer science'):
        self.name = name
        self.birthyear = birthyear
        self.major = major

    def __repr__(self):
        return "Student(name='{name}', birthyear={birthyear}, major='{major}')"\
                .format(name=self.name, birthyear=self.birthyear, major=self.major)

    @property
    def age(self):
        return datetime.now().year - self.birthyear
        
anna = Student('anna', 1987)
anna.age
38

Inheritance

A MasterStudent is a Student with a new extra mandatory internship attribute

Code
"%d" % 2
'2'
Code
x = 2

f"truc {x}"
'truc 2'
Code
class MasterStudent(Student):
    
    def __init__(self, name, age, internship, major='computer science'):
        # Student.__init__(self, name, age, major)
        Student.__init__(self, name, age, major)
        self.internship = internship

    def __repr__(self):
        return f"MasterStudent(name='{self.name}', internship={self.internship}, birthyear={self.birthyear}, major={self.major})"
    
MasterStudent('djalil', 22, 'pwc')
MasterStudent(name='djalil', internship=pwc, birthyear=22, major=computer science)
Code
class MasterStudent(Student):
    
    def __init__(self, name, age, internship, major='computer science'):
        # Student.__init__(self, name, age, major)
        Student.__init__(self, name, age, major)
        self.internship = internship

    def __repr__(self):
        return "MasterStudent(name='{name}', internship='{internship}'" \
               ", birthyear={birthyear}, major='{major}')"\
                .format(name=self.name, internship=self.internship,
                        birthyear=self.birthyear, major=self.major)
    
djalil = MasterStudent('djalil', 1996, 'pwc')
Code
djalil.__dict__
{'name': 'djalil',
 'birthyear': 1996,
 'major': 'computer science',
 'internship': 'pwc'}
Code
djalil.birthyear
1996
Code
djalil.__dict__["birthyear"]
1996

Monkey patching

  • Classes in Python are objects and actually dicts under the hood…
  • Therefore classes are objects that can be changed on the fly
Code
class Monkey(object):
    
    def __init__(self, name):
        self.name = name

    def describe(self):
        print("Old monkey %s" % self.name)

def patch(self):
    print("New monkey %s" % self.name)

monkey = Monkey("Baloo")
monkey.describe()

Monkey.describe = patch
monkey.describe()
Old monkey Baloo
New monkey Baloo
Code
monkeys = [Monkey("Baloo"), Monkey("Super singe")]


monkey_name = monkey.name

for i in range(1000):    
    monkey_name

Data classes

Since Python 3.7 you can use a dataclass for this

Does a lot of work for you (produces the __repr__ among many other things for you)

Code
from dataclasses import dataclass
from datetime import datetime 

@dataclass
class Student(object):
    name: str
    birthyear: int
    major: str = 'computer science'

    @property
    def age(self):
        return datetime.now().year - self.birthyear
        
anna = Student(name="anna", birthyear=1987)
anna
Student(name='anna', birthyear=1987, major='computer science')
Code
print(anna.age)
38

Most common mistakes

  • Let us wrap this up with the most common mistakes with Python

First, best way to learn and practice:

  • Start with the official tutorial https://docs.python.org/fr/3/tutorial/index.html

  • Look at https://python-3-for-scientists.readthedocs.io/en/latest/index.html

  • Continue with the documentation at https://docs.python.org/fr/3/index.html and work!

Using a mutable value as a default value

Code
def foo(bar=[]):
    bar.append('oops')
    return bar

print(foo())
print(foo())
print(foo())

print('-' * 8)
print(foo(['Ah ah']))
print(foo([]))
['oops']
['oops', 'oops']
['oops', 'oops', 'oops']
--------
['Ah ah', 'oops']
['oops']
Code
print(foo.__defaults__)
foo()
print(foo.__defaults__)
(['oops', 'oops', 'oops'],)
(['oops', 'oops', 'oops', 'oops'],)
  • The default value for a function argument is evaluated once, when the function is defined
  • the bar argument is initialized to its default (i.e., an empty list) only when foo() is first defined
  • successive calls to foo() (with no a bar argument specified) use the same list!

One should use instead

Code
def foo(bar=None):
    if bar is None:
        bar = []
    bar.append('oops')
    return bar

print(foo())
print(foo())
print(foo())
print(foo(['OK']))
['oops']
['oops']
['oops']
['OK', 'oops']
Code
print(foo.__defaults__)
foo()
print(foo.__defaults__)
(None,)
(None,)

No problem with immutable types

Code
def foo(bar=()):
    bar += ('oops',)
    return bar

print(foo())
print(foo())
print(foo())
('oops',)
('oops',)
('oops',)
Code
print(foo.__defaults__)
((),)

Class attributes VS object attributes

Code
class A(object):
    x = 1

    def __init__(self):
        self.y = 2

class B(A):
    def __init__(self):
        super().__init__()

class C(A):
    def __init__(self):
        super().__init__()

a, b, c = A(), B(), C()
Code
print(a.x, b.x, c.x)
print(a.y, b.y, c.y)
1 1 1
2 2 2
Code
a.y = 3
print(a.y, b.y, c.y)
3 2 2
Code
a.x = 3  # Adds a new attribute named x in object a
print(a.x, b.x, c.x)
3 1 1
Code
A.x = 4 # Changes the class attribute x of class A
print(a.x, b.x, c.x)
3 4 4
  • Attribute x is not an attribute of b nor c
  • It is also not a class attribute of classes B and C
  • So, it is is looked up in the base class A, which contains a class attribute x

Classes and objects contain a hidden dict to store their attributes, and are accessed following a method resolution order (MRO)

Code
a.__dict__, b.__dict__, c.__dict__
({'y': 3, 'x': 3}, {'y': 2}, {'y': 2})
Code
A.__dict__, B.__dict__, C.__dict__
(mappingproxy({'__module__': '__main__',
               'x': 4,
               '__init__': <function __main__.A.__init__(self)>,
               '__dict__': <attribute '__dict__' of 'A' objects>,
               '__weakref__': <attribute '__weakref__' of 'A' objects>,
               '__doc__': None}),
 mappingproxy({'__module__': '__main__',
               '__init__': <function __main__.B.__init__(self)>,
               '__doc__': None}),
 mappingproxy({'__module__': '__main__',
               '__init__': <function __main__.C.__init__(self)>,
               '__doc__': None}))

This can lead to nasty errors when using class attributes: learn more about this

Python scope rules

Code
try:
    ints += [4]
except NameError:
    print("NameError: name 'ints' is not defined")
NameError: name 'ints' is not defined
Code
ints = [1]

def foo1():
    ints.append(2)
    return ints

def foo2():
    ints += [2]
    return ints
Code
foo1()
[1, 2]
Code
try:    
    foo2()
except UnboundLocalError as inst:
    print(inst)
cannot access local variable 'ints' where it is not associated with a value

What the hell ?

  • An assignment to a variable in a scope assumes that the variable is local to that scope
  • and shadows any similarly named variable in any outer scope
ints += [2]

means

ints = ints + [2]

which is an assigment: ints must be defined in the local scope, but it is not, while

ints.append(2)

is not an assignemnt

Modify a list while iterating over it

Code
odd = lambda x: bool(x % 2)
numbers = list(range(10))

try:
  for i in range(len(numbers)):
      if odd(numbers[i]):
          del numbers[i]
except IndexError as inst:
    print(inst)
list index out of range

Typically an example where one should use a list comprehension

Code
[number for number in numbers if not odd(number)]
[0, 2, 4, 6, 8]

No docstrings

Accept to spend time to write clean docstrings (look at numpydoc style)

Code
def create_student(name, age, address, major='computer science'):
    """Add a student in the database
    
    Parameters
    ----------
    name: `str`
        Name of the student
    
    age: `int`
        Age of the student
    
    address: `str`
        Address of the student
    
    major: `str`, default='computer science'
        The major chosen by the student
    
    Returns
    -------
    output: `Student`
        A fresh student
    """
    pass
Code
create_student('Duduche', 28, 'Chalons')

Not using available methods and/or the simplest solution

Code
dd = {'stephane': 1234, 'gael': 4567, 'gontran': 891011}

# Bad
for key in dd.keys():
    print(key, dd[key])

print('-' * 8)

# Good
for key, value in dd.items():
    print(key, value)
stephane 1234
gael 4567
gontran 891011
--------
stephane 1234
gael 4567
gontran 891011
Code
colors = ['black', 'yellow', 'brown', 'red', 'pink']

# Bad
for i in range(len(colors)):
    print(i, colors[i])

print('-' * 8)

# Good
for i, color in enumerate(colors):
    print(i, color)
0 black
1 yellow
2 brown
3 red
4 pink
--------
0 black
1 yellow
2 brown
3 red
4 pink

Not using the standard library

While it’s always better than a hand-made solution

Code
list1 = [1, 2]
list2 = [3, 4]
list3 = [5, 6, 7]

for a in list1:
    for b in list2:
        for c in list3:
            print(a, b, c)
1 3 5
1 3 6
1 3 7
1 4 5
1 4 6
1 4 7
2 3 5
2 3 6
2 3 7
2 4 5
2 4 6
2 4 7
Code
from itertools import product

for a, b, c in product(list1, list2, list3):
    print(a, b, c)
1 3 5
1 3 6
1 3 7
1 4 5
1 4 6
1 4 7
2 3 5
2 3 6
2 3 7
2 4 5
2 4 6
2 4 7

That’s it for now !