Introduction to Python

We introduce here the python language. Only the bare minimum necessary for getting started with the data-science stack (a bunch of libraries for data science). Python is a programming language, as are C++, java, fortran, javascript, etc.

Specific features of Python

an interpreted (as opposed to compiled) language. Contrary to e.g. C++ or fortran, one does not compile Python code before executing it.
Used as a scripting language, by python python script.py in a terminal
But can be used also interactively: the jupyter notebook, iPython, etc.
A free software released under an open-source license: Python can be used and distributed free of charge, even for building commercial software.
multi-platform: Python is available for all major operating systems, Windows, Linux/Unix, MacOS X, most likely your mobile phone OS, etc.
A very readable language with clear non-verbose syntax
A language for which a large amount of high-quality packages are available for various applications, including web-frameworks and scientific computing
It has been one of the top languages for data science and machine learning for several years, because it is expressive and and easy to deploy
An object-oriented language

See https://www.python.org/about/ for more information about distinguishing features of Python.

Python 2 or Python 3?

Simple answer: don’t use Python 2, use Python 3
Python 2 is mostly deprecated and has not been maintained for years
You’ll end up hanged if you use Python 2
If Python 2 is mandatory at your workplace, find another work

Jupyter or Quarto notebooks?

quarto is more git friendly than jupyter
Enjoy authentic editors
Go for quarto

Hello world

In a jupyter/quarto notebook, you have an interactive interpreter.
You type in the cells, execute commands

Code

print("Hi everybody!")

Hi everybody!

Basic types

Integers

Code

1 + 42

Code

type(1+1)

int

We can assign values to variables with =

Code

a = (3 + 5 ** 2) % 4
a

Remark

We don’t declare the type of a variable before assigning its value. In C, conversely, one should write

int a = 4;

Something cool

Arbitrary large integer arithmetics

Code

17 ** 542

8004153099680695240677662228684856314409365427758266999205063931175132640587226837141154215226851187899067565063096026317140186260836873939218139105634817684999348008544433671366043519135008200013865245747791955240844192282274023825424476387832943666754140847806277355805648624376507618604963106833797989037967001806494232055319953368448928268857747779203073913941756270620192860844700087001827697624308861431399538404552468712313829522630577767817531374612262253499813723569981496051353450351968993644643291035336065584116155321928452618573467361004489993801594806505273806498684433633838323916674207622468268867047187858269410016150838175127772100983052010703525089

Floats

There exists a floating point type that is created when the variable has decimal values

Code

c = 2.

Code

type(c)

float

Code

c = 2
type(c)

int

Code

truc = 1 / 2
truc

0.5

Code

1 // 2 + 1 % 2

Code

type(truc)

float

Boolean

Similarly, boolean types are created from a comparison

Code

test = 3 > 4
test

False

Code

type(test)

bool

Code

False == (not True)

True

Code

1.41 < 2.71 and 2.71 < 3.14

True

Code

# It's equivalent to
1.41 < 2.71 < 3.14

True

Type conversion (casting)

Code

a = 1
type(a)

int

Code

b = float(a)
type(b)

float

Code

str(b)

'1.0'

Code

bool(b)
# All non-zero, non empty objects are casted to boolean as True (more later)

True

Code

bool(1-1)

False

Containers

Python provides many efficient types of containers or sequences, in which collections of objects can be stored.

The main ones are list, tuple, set and dict (but there are many others…)

Tuples

Code

tt = 'truc', 3.14, "truc"
tt

('truc', 3.14, 'truc')

Code

tt[0]

'truc'

You can’t change a tuple, we say that it’s immutable

Code

try:
    tt[0] = 1
except TypeError:
    print(f"TypeError: 'tuple' object does not support item assignment")

TypeError: 'tuple' object does not support item assignment

Three ways of doing the same thing

Code

# Method 1
tuple([1, 2, 3])

(1, 2, 3)

Code

# Method 2
1, 2, 3

(1, 2, 3)

Code

# Method 3
(1, 2, 3)

(1, 2, 3)

Simpler is better in Python, so usually you want to use Method 2.

Code

toto = 1, 2, 3
toto

(1, 2, 3)

This is serious !

The Zen of Python easter’s egg

Code

import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Lists

A list is an ordered collection of objects. These objects may have different types. For example:

Code

colors = ['red', 'blue', 'green', 'black', 'white']

Code

colors[0]

'red'

Code

type(colors)

list

Indexing: accessing individual objects contained in the list by their position

Code

colors[2]

'green'

Code

colors[2] = 3.14
colors

['red', 'blue', 3.14, 'black', 'white']

Warning

For any iterable object in Python, indexing starts at 0 (as in C), not at 1 (as in Fortran, R, or Matlab).

Counting from the end with negative indices:

Code

colors[-1]

'white'

Index must remain in the range of the list

Code

try:
    colors[10]
except IndexError:
    print(f"IndexError: 10 >= {len(colors)} ==len(colors), index out of range ")

Code

colors

['red', 'blue', 3.14, 'black', 'white']

Code

tt

('truc', 3.14, 'truc')

Code

colors.append(tt)
colors

['red', 'blue', 3.14, 'black', 'white', ('truc', 3.14, 'truc')]

Code

len(colors)

Code

len(tt)

Slicing: obtaining sublists of regularly-spaced elements

This work with anything iterable whenever it makes sense (list, str, tuple, etc.)

Code

colors

['red', 'blue', 3.14, 'black', 'white', ('truc', 3.14, 'truc')]

Code

list(reversed(colors))

[('truc', 3.14, 'truc'), 'white', 'black', 3.14, 'blue', 'red']

Code

colors[::-1]

[('truc', 3.14, 'truc'), 'white', 'black', 3.14, 'blue', 'red']

Slicing syntax:

colors[start:stop:stride]

start, stop, stride are optional, with default values 0, len(sequence), 1

Code

print(slice(4))
print(slice(1,5))
print(slice(None,13,3))

slice(None, 4, None)
slice(1, 5, None)
slice(None, 13, 3)

Code

sl = slice(1,5,2)
colors[sl]

['blue', 'black']

Code

colors

['red', 'blue', 3.14, 'black', 'white', ('truc', 3.14, 'truc')]

Code

colors[3:]

['black', 'white', ('truc', 3.14, 'truc')]

Code

colors[:3]

['red', 'blue', 3.14]

Code

colors[1::2]

['blue', 'black', ('truc', 3.14, 'truc')]

Code

colors[::-1]

[('truc', 3.14, 'truc'), 'white', 'black', 3.14, 'blue', 'red']

Strings

Different string syntaxes (simple, double or triple quotes):

Code

s = 'tintin'
type(s)

str

Code

'tintin'

Code

s = """         Bonjour,
Je m'appelle Stephane.
Je vous souhaite une bonne journée.
Salut.       
"""
s

"         Bonjour,\nJe m'appelle Stephane.\nJe vous souhaite une bonne journée.\nSalut.       \n"

Code

s.strip()

"Bonjour,\nJe m'appelle Stephane.\nJe vous souhaite une bonne journée.\nSalut."

Code

print(s.strip())

Bonjour,
Je m'appelle Stephane.
Je vous souhaite une bonne journée.
Salut.

Code

len(s)

Code

# Casting to a list
list(s.strip()[:15])

['B', 'o', 'n', 'j', 'o', 'u', 'r', ',', '\n', 'J', 'e', ' ', 'm', "'", 'a']

Code

# Arithmetics
print('Bonjour' * 2)
print('Hello' + ' all')

BonjourBonjour
Hello all

Code

sss = 'A'
sss += 'bc'
sss += 'dE'
sss.lower()

'abcde'

Code

ss = s.strip()
print(ss[:10] + ss[24:28])

Bonjour,
Jepha

Code

s.strip()

"Bonjour,\nJe m'appelle Stephane.\nJe vous souhaite une bonne journée.\nSalut."

Code

s.strip().split('\n')

['Bonjour,',
 "Je m'appelle Stephane.",
 'Je vous souhaite une bonne journée.',
 'Salut.']

Code

s[::3]

'   BjrJmpl ea.eo ui eoeon.at  \n'

Code

s[3:10]

'      B'

Code

" ".join(['Il', 'fait', 'super', 'beau', "aujourd'hui"])

"Il fait super beau aujourd'hui"

Chaining method calls is the basic of pipeline building.

Code

( 
    " ".join(['Il', 'fait', 'super', 'beau', "aujourd'hui"])
       .title()
       .replace(' ', '')
       .replace("'","")
)

'IlFaitSuperBeauAujourdHui'

Important

A string is immutable !!

Code

s = 'I am an immutable guy'

Code

try:  
    s[2] = 's'
except TypeError:
    print(f"Strings are immutable! s is still '{s}'")

Strings are immutable! s is still 'I am an immutable guy'

Code

id(s)

134288353205360

Code

print(s + ', for sure')
id(s), id(s + ' for sure')

I am an immutable guy, for sure

(134288353205360, 134288353731856)

Extra stuff with strings

Code

'square of 2 is ' + str(2 ** 2)

'square of 2 is 4'

Code

'square of 2 is %d' % 2 ** 2

'square of 2 is 4'

Code

'square of 2 is {}'.format(2 ** 2)

'square of 2 is 4'

Code

'square of 2 is {square}'.format(square=2 ** 2)

'square of 2 is 4'

Code

# And since Python 3.6 you can use an `f-string`
number = 2
square = number ** 2

f'square of {number} is {square}'

'square of 2 is 4'

The `in` keyword

You can use the in keyword with any container, whenever it makes sense

Code

print(s)
print('Salut' in s)

I am an immutable guy
False

Code

print(tt)
print('truc' in tt)

('truc', 3.14, 'truc')
True

Code

print(colors)
print('truc' in colors)

['red', 'blue', 3.14, 'black', 'white', ('truc', 3.14, 'truc')]
False

Code

('truc', 3.14, 'truc') in colors

True

Warning

Strings are not bytes. Have a look at chapter 4 Unicode Text versus Bytes in Fluent Python

Brain-teasing

Explain this weird behaviour:

Code

5 in [1, 2, 3, 4] == False

False

Code

[1, 2, 3, 4] == False

False

Code

5 not in [1, 2, 3, 4]

True

Code

(5 in [1, 2, 3, 4]) == False

True

Code

# ANSWER.
# This is a chained comparison. We have seen that 
1 < 2 < 3
# is equivalent to
(1 < 2) and (2 < 3)
# so that
5 in [1, 2, 3, 4] == False
# is equivalent to
(5 in [1, 2, 3, 4]) and ([1, 2, 3, 4] == False)

False

Code

(5 in [1, 2, 3, 4])

False

Code

([1, 2, 3, 4] == False)

False

Dictionaries

A dictionary is basically an efficient table that maps keys to values.
The MOST important container in Python.
Many things are actually a dict under the hood in Python

Code

tel = {'emmanuelle': 5752, 'sebastian': 5578}
print(tel)
print(type(tel))

{'emmanuelle': 5752, 'sebastian': 5578}
<class 'dict'>

Code

tel['emmanuelle'], tel['sebastian']

(5752, 5578)

Code

tel['francis'] = '5919'
tel

{'emmanuelle': 5752, 'sebastian': 5578, 'francis': '5919'}

Code

len(tel)

Important remarks

Keys can be of different types
A key must be of immutable type

Code

tel[7162453] = [1, 3, 2]
tel[3.14] = 'bidule'
tel[('jaouad', 2)] = 1234
tel

{'emmanuelle': 5752,
 'sebastian': 5578,
 'francis': '5919',
 7162453: [1, 3, 2],
 3.14: 'bidule',
 ('jaouad', 2): 1234}

Code

try:
    sorted(tel)
except TypeError:
    print("TypeError: '<' not supported between instances of 'int' and 'str'")

TypeError: '<' not supported between instances of 'int' and 'str'

Code

# A list is mutable and not hashable
try:
    tel[['jaouad']] = '5678'
except TypeError:
    print("TypeError: unhashable type: 'list'")

TypeError: unhashable type: 'list'

Code

try:
    tel[2]
except KeyError:
    print("KeyError: 2")

KeyError: 2

Code

tel = {'emmanuelle': 5752, 'sebastian' : 5578, 'jaouad' : 1234}
print(tel.keys())
print(tel.values())
print(tel.items())

dict_keys(['emmanuelle', 'sebastian', 'jaouad'])
dict_values([5752, 5578, 1234])
dict_items([('emmanuelle', 5752), ('sebastian', 5578), ('jaouad', 1234)])

Code

list(tel.keys())[2]

'jaouad'

Code

tel.values().mapping

mappingproxy({'emmanuelle': 5752, 'sebastian': 5578, 'jaouad': 1234})

Code

type(tel.keys())

dict_keys

Code

'rémi' in tel

False

Code

list(tel)

['emmanuelle', 'sebastian', 'jaouad']

Code

'rémi' in tel.keys()

False

You can swap values like this

Code

print(tel)
tel['emmanuelle'], tel['sebastian'] = tel['sebastian'], tel['emmanuelle']
print(tel)

{'emmanuelle': 5752, 'sebastian': 5578, 'jaouad': 1234}
{'emmanuelle': 5578, 'sebastian': 5752, 'jaouad': 1234}

Code

# It works, since
a, b = 2.71, 3.14
a, b = b, a
a, b

(3.14, 2.71)

Exercise 1

Get keys of tel sorted by decreasing order

Code

tel = {'emmanuelle': 5752, 'sebastian' : 5578, 'jaouad' : 1234}

Exercise 2

Get keys of tel sorted by increasing values

Code

tel = {'emmanuelle': 5752, 'sebastian' : 5578, 'jaouad' : 1234}

Exercise 3

Obtain a sorted-by-key version of tel

Code

tel = {'emmanuelle': 5752, 'sebastian' : 5578, 'jaouad' : 1234}

Sets

A set is an unordered container, containing unique elements

Code

ss = {1, 2, 2, 2, 3, 3, 'tintin', 'tintin', 'toto'}
ss

{1, 2, 3, 'tintin', 'toto'}

Code

s = 'truc truc bidule truc'
set(s)

{' ', 'b', 'c', 'd', 'e', 'i', 'l', 'r', 't', 'u'}

Code

set(list(s))

{' ', 'b', 'c', 'd', 'e', 'i', 'l', 'r', 't', 'u'}

Code

{1, 5, 2, 1, 1}.union({1, 2, 3})

{1, 2, 3, 5}

Code

set((1, 5, 3, 2))

{1, 2, 3, 5}

Code

set([1, 5, 2, 1, 1]).intersection(set([1, 2, 3]))

{1, 2}

Code

ss.add('tintin')
ss

{1, 2, 3, 'tintin', 'toto'}

Code

ss.difference(range(6))

{'tintin', 'toto'}

You can combine all containers together

Code

dd = {
    'truc': [1, 2, 3], 
    5: (1, 4, 2),
    (1, 3): {'hello', 'world'}
}
dd

{'truc': [1, 2, 3], 5: (1, 4, 2), (1, 3): {'hello', 'world'}}

Assigments in `Python` is name binding

Everything is either mutable or immutable

Code

ss = {1, 2, 3}
sss = ss
sss, ss

({1, 2, 3}, {1, 2, 3})

Code

id(ss), id(sss)

(134286904486688, 134286904486688)

Code

sss.add("Truc")

Question. What is in ss ?

Code

ss, sss

({1, 2, 3, 'Truc'}, {1, 2, 3, 'Truc'})

ss and sss are names for the same object

Code

id(ss), id(sss)

(134286904486688, 134286904486688)

Code

ss is sss

True

Code

help('is')

Comparisons
***********

Unlike C, all comparison operations in Python have the same priority,
which is lower than that of any arithmetic, shifting or bitwise
operation.  Also unlike C, expressions like "a < b < c" have the
interpretation that is conventional in mathematics:

   comparison    ::= or_expr (comp_operator or_expr)*
   comp_operator ::= "<" | ">" | "==" | ">=" | "<=" | "!="
                     | "is" ["not"] | ["not"] "in"

Comparisons yield boolean values: "True" or "False". Custom *rich
comparison methods* may return non-boolean values. In this case Python
will call "bool()" on such value in boolean contexts.

Comparisons can be chained arbitrarily, e.g., "x < y <= z" is
equivalent to "x < y and y <= z", except that "y" is evaluated only
once (but in both cases "z" is not evaluated at all when "x < y" is
found to be false).

Formally, if *a*, *b*, *c*, …, *y*, *z* are expressions and *op1*,
*op2*, …, *opN* are comparison operators, then "a op1 b op2 c ... y
opN z" is equivalent to "a op1 b and b op2 c and ... y opN z", except
that each expression is evaluated at most once.

Note that "a op1 b op2 c" doesn’t imply any kind of comparison between
*a* and *c*, so that, e.g., "x < y > z" is perfectly legal (though
perhaps not pretty).


Value comparisons
=================

The operators "<", ">", "==", ">=", "<=", and "!=" compare the values
of two objects.  The objects do not need to have the same type.

Chapter Objects, values and types states that objects have a value (in
addition to type and identity).  The value of an object is a rather
abstract notion in Python: For example, there is no canonical access
method for an object’s value.  Also, there is no requirement that the
value of an object should be constructed in a particular way, e.g.
comprised of all its data attributes. Comparison operators implement a
particular notion of what the value of an object is.  One can think of
them as defining the value of an object indirectly, by means of their
comparison implementation.

Because all types are (direct or indirect) subtypes of "object", they
inherit the default comparison behavior from "object".  Types can
customize their comparison behavior by implementing *rich comparison
methods* like "__lt__()", described in Basic customization.

The default behavior for equality comparison ("==" and "!=") is based
on the identity of the objects.  Hence, equality comparison of
instances with the same identity results in equality, and equality
comparison of instances with different identities results in
inequality.  A motivation for this default behavior is the desire that
all objects should be reflexive (i.e. "x is y" implies "x == y").

A default order comparison ("<", ">", "<=", and ">=") is not provided;
an attempt raises "TypeError".  A motivation for this default behavior
is the lack of a similar invariant as for equality.

The behavior of the default equality comparison, that instances with
different identities are always unequal, may be in contrast to what
types will need that have a sensible definition of object value and
value-based equality.  Such types will need to customize their
comparison behavior, and in fact, a number of built-in types have done
that.

The following list describes the comparison behavior of the most
important built-in types.

* Numbers of built-in numeric types (Numeric Types — int, float,
  complex) and of the standard library types "fractions.Fraction" and
  "decimal.Decimal" can be compared within and across their types,
  with the restriction that complex numbers do not support order
  comparison.  Within the limits of the types involved, they compare
  mathematically (algorithmically) correct without loss of precision.

  The not-a-number values "float('NaN')" and "decimal.Decimal('NaN')"
  are special.  Any ordered comparison of a number to a not-a-number
  value is false. A counter-intuitive implication is that not-a-number
  values are not equal to themselves.  For example, if "x =
  float('NaN')", "3 < x", "x < 3" and "x == x" are all false, while "x
  != x" is true.  This behavior is compliant with IEEE 754.

* "None" and "NotImplemented" are singletons.  **PEP 8** advises that
  comparisons for singletons should always be done with "is" or "is
  not", never the equality operators.

* Binary sequences (instances of "bytes" or "bytearray") can be
  compared within and across their types.  They compare
  lexicographically using the numeric values of their elements.

* Strings (instances of "str") compare lexicographically using the
  numerical Unicode code points (the result of the built-in function
  "ord()") of their characters. [3]

  Strings and binary sequences cannot be directly compared.

* Sequences (instances of "tuple", "list", or "range") can be compared
  only within each of their types, with the restriction that ranges do
  not support order comparison.  Equality comparison across these
  types results in inequality, and ordering comparison across these
  types raises "TypeError".

  Sequences compare lexicographically using comparison of
  corresponding elements.  The built-in containers typically assume
  identical objects are equal to themselves.  That lets them bypass
  equality tests for identical objects to improve performance and to
  maintain their internal invariants.

  Lexicographical comparison between built-in collections works as
  follows:

  * For two collections to compare equal, they must be of the same
    type, have the same length, and each pair of corresponding
    elements must compare equal (for example, "[1,2] == (1,2)" is
    false because the type is not the same).

  * Collections that support order comparison are ordered the same as
    their first unequal elements (for example, "[1,2,x] <= [1,2,y]"
    has the same value as "x <= y").  If a corresponding element does
    not exist, the shorter collection is ordered first (for example,
    "[1,2] < [1,2,3]" is true).

* Mappings (instances of "dict") compare equal if and only if they
  have equal "(key, value)" pairs. Equality comparison of the keys and
  values enforces reflexivity.

  Order comparisons ("<", ">", "<=", and ">=") raise "TypeError".

* Sets (instances of "set" or "frozenset") can be compared within and
  across their types.

  They define order comparison operators to mean subset and superset
  tests.  Those relations do not define total orderings (for example,
  the two sets "{1,2}" and "{2,3}" are not equal, nor subsets of one
  another, nor supersets of one another).  Accordingly, sets are not
  appropriate arguments for functions which depend on total ordering
  (for example, "min()", "max()", and "sorted()" produce undefined
  results given a list of sets as inputs).

  Comparison of sets enforces reflexivity of its elements.

* Most other built-in types have no comparison methods implemented, so
  they inherit the default comparison behavior.

User-defined classes that customize their comparison behavior should
follow some consistency rules, if possible:

* Equality comparison should be reflexive. In other words, identical
  objects should compare equal:

     "x is y" implies "x == y"

* Comparison should be symmetric. In other words, the following
  expressions should have the same result:

     "x == y" and "y == x"

     "x != y" and "y != x"

     "x < y" and "y > x"

     "x <= y" and "y >= x"

* Comparison should be transitive. The following (non-exhaustive)
  examples illustrate that:

     "x > y and y > z" implies "x > z"

     "x < y and y <= z" implies "x < z"

* Inverse comparison should result in the boolean negation. In other
  words, the following expressions should have the same result:

     "x == y" and "not x != y"

     "x < y" and "not x >= y" (for total ordering)

     "x > y" and "not x <= y" (for total ordering)

  The last two expressions apply to totally ordered collections (e.g.
  to sequences, but not to sets or mappings). See also the
  "total_ordering()" decorator.

* The "hash()" result should be consistent with equality. Objects that
  are equal should either have the same hash value, or be marked as
  unhashable.

Python does not enforce these consistency rules. In fact, the
not-a-number values are an example for not following these rules.


Membership test operations
==========================

The operators "in" and "not in" test for membership.  "x in s"
evaluates to "True" if *x* is a member of *s*, and "False" otherwise.
"x not in s" returns the negation of "x in s".  All built-in sequences
and set types support this as well as dictionary, for which "in" tests
whether the dictionary has a given key. For container types such as
list, tuple, set, frozenset, dict, or collections.deque, the
expression "x in y" is equivalent to "any(x is e or x == e for e in
y)".

For the string and bytes types, "x in y" is "True" if and only if *x*
is a substring of *y*.  An equivalent test is "y.find(x) != -1".
Empty strings are always considered to be a substring of any other
string, so """ in "abc"" will return "True".

For user-defined classes which define the "__contains__()" method, "x
in y" returns "True" if "y.__contains__(x)" returns a true value, and
"False" otherwise.

For user-defined classes which do not define "__contains__()" but do
define "__iter__()", "x in y" is "True" if some value "z", for which
the expression "x is z or x == z" is true, is produced while iterating
over "y". If an exception is raised during the iteration, it is as if
"in" raised that exception.

Lastly, the old-style iteration protocol is tried: if a class defines
"__getitem__()", "x in y" is "True" if and only if there is a non-
negative integer index *i* such that "x is y[i] or x == y[i]", and no
lower integer index raises the "IndexError" exception.  (If any other
exception is raised, it is as if "in" raised that exception).

The operator "not in" is defined to have the inverse truth value of
"in".


Identity comparisons
====================

The operators "is" and "is not" test for an object’s identity: "x is
y" is true if and only if *x* and *y* are the same object.  An
Object’s identity is determined using the "id()" function.  "x is not
y" yields the inverse truth value. [4]

Related help topics: EXPRESSIONS, BASICMETHODS

About assigments

Python never copies an object
Unless you ask him to

When you code

x = [1, 2, 3]
y = x

you just - bind the variable name x to a list [1, 2, 3] - give another name y to the same object

Important remarks

Everything is an object in Python
Either immutable or mutable

Code

id(1), id(1+1), id(2)

(11753896, 11753928, 11753928)

A list is mutable

Code

x = [1, 2, 3]
print(id(x), x)
x[0] += 42; x.append(3.14)
print(id(x), x)

134288353648384 [1, 2, 3]
134288353648384 [43, 2, 3, 3.14]

A str is immutable

In order to “change” an immutable object, Python creates a new one

Code

s = 'to'
print(id(s), s)
s += 'to'
print(id(s), s)

134288676817440 to
134288353246496 toto

Once again, a list is mutable

Code

super_list = [3.14, (1, 2, 3), 'tintin']
other_list = super_list
id(other_list), id(super_list)

(134288353864832, 134288353864832)

other_list and super_list are the same list
If you change one, you change the other.
id returns the identity of an object. Two objects with the same idendity are the same (not only the same type, but the same instance)

Code

other_list[1] = 'youps'
other_list, super_list

([3.14, 'youps', 'tintin'], [3.14, 'youps', 'tintin'])

Code

id(super_list), id(other_list)

(134288353864832, 134288353864832)

If you want a copy, to need to ask for one

Code

other_list = super_list.copy()
id(other_list), id(super_list)

(134288353471744, 134288353864832)

Code

other_list[1] = 'copy'
other_list, super_list

([3.14, 'copy', 'tintin'], [3.14, 'youps', 'tintin'])

Only other_list is modified.

But… what if you have a list of list ? (or a mutable object containing mutable objects)

Code

l1, l2 = [1, 2, 3], [4, 5, 6]
list_list = [l1, l2]
list_list

[[1, 2, 3], [4, 5, 6]]

Code

id(list_list), id(list_list[0]), id(l1), list_list[0] is l1

(134288353460288, 134288353211264, 134288353211264, True)

Let’s make a copy of list_list

Code

copy_list = list_list.copy()
copy_list.append('super')
list_list, copy_list

([[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6], 'super'])

Code

id(list_list[0]), id(copy_list[0])

(134288353211264, 134288353211264)

OK, only copy_list is modified, as expected

But now…

Code

copy_list[0][1] = 'oups'
copy_list, list_list

([[1, 'oups', 3], [4, 5, 6], 'super'], [[1, 'oups', 3], [4, 5, 6]])

Question. What happened ?!?

The list_list object is copied
But NOT what it’s containing !
By default copy does a shallow copy, not a deep copy
It does not build copies of what is contained
If you want to copy an object and all that is contained in it, you need to use deepcopy.

Code

from copy import deepcopy

copy_list = deepcopy(list_list)
copy_list[0][1] = 'incredible !'
list_list, copy_list

([[1, 'oups', 3], [4, 5, 6]], [[1, 'incredible !', 3], [4, 5, 6]])

Final remarks

Code

tt = ([1, 2, 3], [4, 5, 6])
print(id(tt), tt)
print(list(map(id, tt)))

134286902638208 ([1, 2, 3], [4, 5, 6])
[134288353646976, 134288353647872]

Code

tt[0][1] = '42'
print(id(tt), tt)
print(list(map(id, tt)))

134286902638208 ([1, '42', 3], [4, 5, 6])
[134288353646976, 134288353647872]

Code

s = [1, 2, 3]

Code

s2 = s

Code

s2 is s

True

Code

id(s2), id(s)

(134286903736640, 134286903736640)

Control flow and other stuff…

Namely tests, loops, again booleans, etc.

Code

if 2 ** 2 == 5:
    print('Obvious')
else:
    print('YES')
print('toujours')

YES
toujours

Blocks are delimited by indentation!

Code

a = 3
if a > 0:
    if a == 1:
        print(1)
    elif a == 2:
        print(2)
elif a == 2:
    print(2)
elif a == 3:
    print(3)
else:
    print(a)

Anything can be understood as a boolean

For example, don’t do this to test if a list is empty

Code

l2 = ['hello', 'everybody']

if len(l2) > 0:
    print(l2[0])

hello

but this

Code

if l2:
    print(l2[0])

hello

Some poetry

An empty dict is False
An empty string is False
An empty list is False
An empty tuple is False
An empty set is False
0 is False
.0 is False
etc…
everything else is True

While loops

Code

a = 10
b = 1
while b < a:
    b = b + 1
    print(b)

Compute the decimals of Pi using the Wallis formula

\[ \pi = 2 \prod_{i=1}^{100} \frac{4i^2}{4i^2 - 1} \]

Code

pi = 2
eps = 1e-10
dif = 2 * eps
i = 1
while dif > eps:
    pi, i, old_pi = pi * 4 * i ** 2 / (4 * i ** 2 - 1), i + 1, pi
    dif = pi - old_pi

Code

pi

3.1415837914138556

Code

from math import pi

pi

3.141592653589793

`for` loop with `range`

Iteration with an index, with a list, with many things !
range has the same parameters as with slicing start:end:stride, all parameters being optional

Code

for i in range(10):
    print(i)

Code

for i in range(4):
    print(i + 1)
print('-')

for i in range(1, 5):
    print(i)
print('-')

for i in range(1, 10, 3):
    print(i)

Something for nerds. You can use else in a for loop

Code

names = ['stephane', 'mokhtar', 'jaouad', 'simon', 'yiyang']

for name in names:
    if name.startswith('u'):
        print(name)
        break
else:
    print('Not found.')

Not found.

Code

names = ['stephane', 'mokhtar', 'jaouad', 'ulysse', 'simon', 'yiyang']

for name in names:
    if name.startswith('u'):
        print(name)
        break
else:
    print('Not found.')

ulysse

For loops over iterable objects

You can iterate using for over any container: list, tuple, dict, str, set among others…

Code

colors = ['red', 'blue', 'black', 'white']
peoples = ['stephane', 'jaouad', 'mokhtar', 'yiyang', 'rémi']

Code

# This is stupid
for i in range(len(colors)):
    print(colors[i])
    
# This is better
for color in colors:
    print(color)

red
blue
black
white
red
blue
black
white

To iterate over several sequences at the same time, use zip

Code

for color, people in zip(colors, peoples):
    print(color, people)

red stephane
blue jaouad
black mokhtar
white yiyang

Code

l = ["Bonjour", {'francis': 5214, 'stephane': 5123}, ('truc', 3)]
for e in l:
    print(e, len(e))

Bonjour 7
{'francis': 5214, 'stephane': 5123} 2
('truc', 3) 2

Loop over a str

Code

s = 'Bonjour'
for c in s:
    print(c)

B
o
n
j
o
u
r

Loop over a dict

Code

dd = {(1, 3): {'hello', 'world'}, 'truc': [1, 2, 3], 5: (1, 4, 2)}

# Default is to loop over keys
for key in dd:
    print(key)

(1, 3)
truc
5

Code

# Loop over values
for e in dd.values():
    print(e)

{'hello', 'world'}
[1, 2, 3]
(1, 4, 2)

Code

# Loop over items (key, value) pairs
for key, val in dd.items():
    print(key, val)

(1, 3) {'hello', 'world'}
truc [1, 2, 3]
5 (1, 4, 2)

Code

for t in dd.items():
    print(t)

((1, 3), {'hello', 'world'})
('truc', [1, 2, 3])
(5, (1, 4, 2))

Comprehensions

You can construct a list, dict, set and others using the comprehension syntax

list comprehension

Code

print(colors)
print(peoples)

['red', 'blue', 'black', 'white']
['stephane', 'jaouad', 'mokhtar', 'yiyang', 'rémi']

Code

l = []
for p, c in zip(peoples, colors):
    if len(c)<=4 :
        l.append(p)
print(l)

['stephane', 'jaouad']

Code

# The list of people with favorite color that has no more than 4 characters

[people for color, people in zip(colors, peoples) if len(color) <= 4]

['stephane', 'jaouad']

dict comprehension

Code

{people: color for color, people in zip(colors, peoples) if len(color) <= 4}

{'stephane': 'red', 'jaouad': 'blue'}

Code

# Allows to build a dict from two lists (for keys and values)
{key: value for (key, value) in zip(peoples, colors)}

{'stephane': 'red', 'jaouad': 'blue', 'mokhtar': 'black', 'yiyang': 'white'}

Code

# But it's simpler (so better) to use
dict(zip(peoples, colors))

{'stephane': 'red', 'jaouad': 'blue', 'mokhtar': 'black', 'yiyang': 'white'}

Something very convenient is enumerate

Code

for i, color in enumerate(colors):
    print(i, color)

0 red
1 blue
2 black
3 white

Code

list(enumerate(colors))

[(0, 'red'), (1, 'blue'), (2, 'black'), (3, 'white')]

Code

dict(enumerate(s))

{0: 'B', 1: 'o', 2: 'n', 3: 'j', 4: 'o', 5: 'u', 6: 'r'}

Code

print(dict(enumerate(s)))

{0: 'B', 1: 'o', 2: 'n', 3: 'j', 4: 'o', 5: 'u', 6: 'r'}

Code

s = 'Hey everyone'
{c: i for i, c in enumerate(s)}

{'H': 0, 'e': 11, 'y': 8, ' ': 3, 'v': 5, 'r': 7, 'o': 9, 'n': 10}

About functional programming

We can use lambda to define anonymous functions, and use them in the map and reduce functions

Code

square = lambda x: x ** 2
square(2)

Code

type(square)

function

Code

dir(square)

['__annotations__',
 '__builtins__',
 '__call__',
 '__class__',
 '__closure__',
 '__code__',
 '__defaults__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__get__',
 '__getattribute__',
 '__getstate__',
 '__globals__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__kwdefaults__',
 '__le__',
 '__lt__',
 '__module__',
 '__name__',
 '__ne__',
 '__new__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__type_params__']

Code

s = "a"

Code

try:
    square("a")
except TypeError:
    print("TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'")

TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'

Code

sum2 = lambda a, b: a + b
print(sum2('Hello', ' world'))
print(sum2(1, 2))

Hello world
3

Intended for short and one-line function.

More complex functions use def (see below)

Exercise

Print the squares of even numbers between 0 et 15

Using a list comprehension as before
Using map

Brain-teasing

What is the output of

Code

reduce(lambda a, b: a + b[0] * b[1], enumerate('abcde'), 'A')

Generators

Code

import sys
import matplotlib.pyplot as plt
%matplotlib inline

Code

plt.figure(figsize=(6, 6))
plt.plot([sys.getsizeof(list(range(i))) for i in range(10000)], lw=3)
plt.plot([sys.getsizeof(range(i)) for i in range(10000)], lw=3)
plt.xlabel('Number of elements (value of i)', fontsize=14)
plt.ylabel('Size (in bytes)', fontsize=14)
_ = plt.legend(['list(range(i))', 'range(i)'], fontsize=16)

Why generators ?

The memory used by range(i) does not scale linearly with i

What is happening ?

range(n) does not allocate a list of n elements !
It generates on the fly the list of required integers
We say that such an object behaves like a generator in Python
Many things in the Python standard library behaves like this

Warning. Getting the real memory footprint of a Python object is difficult. Note that sizeof calls the __sizeof__ method of r, which does not give in general the actual memory used by an object. But nevermind here.

The following computation has no memory footprint:

Code

sum(range(10**8))

4999999950000000

Code

map(lambda x: x**2, range(10**7))

<map at 0x7a221b247bb0>

map does not return a list for the same reason

Code

sum(map(lambda x: x**2, range(10**6)))

333332833333500000

Generator expression

Namely generators defined through comprehensions. Just replace [] by () in the comprehension.

A generator can be iterated on only once

Code

range(10)

range(0, 10)

Code

carres = (i**2 for i in range(10))

Code

carres

<generator object <genexpr> at 0x7a2271963510>

Code

for c in carres:
    print(c)

Code

for i in range(4):
    for j in range(3):
        print(i, j)

Code

from itertools import product

for t in product(range(4), range(3)):
    print(t)

(0, 0)
(0, 1)
(0, 2)
(1, 0)
(1, 1)
(1, 2)
(2, 0)
(2, 1)
(2, 2)
(3, 0)
(3, 1)
(3, 2)

Code

from itertools import product

gene = (i + j for i, j in product(range(3), range(3)))
gene

<generator object <genexpr> at 0x7a2271963e00>

Code

print(list(gene))
print(list(gene))

[0, 1, 2, 1, 2, 3, 2, 3, 4]
[]

`yield`

Something very powerful

Code

def startswith(words, letter):
    for word in words:
        if word.startswith(letter):
            yield word

Code

words = [
    'Python', "is", 'awesome', 'in', 'particular', 'generators', 
    'are', 'really', 'cool'
]

Code

list(word for word in words if word.startswith("a"))

['awesome', 'are']

Code

a = 2

Code

float(a)

2.0

But also with a for loop

Code

for word in startswith(words, letter='a'):
    print(word)

awesome
are

Code

it = startswith(words, letter='a')

Code

type(it)

generator

Code

next(it)

'awesome'

Code

next(it)

'are'

Code

try:
    next(it)
except StopIteration:
    print("StopIteration exception!")

StopIteration exception!

A glimpse at the `collections` module

(This is where the good stuff hides)

Code

texte = """             
Bonjour,
Python c'est super.
Python ca a l'air quand même un peu compliqué.
Mais bon, ca a l'air pratique.
Peut-être que je pourrais m'en servir pour faire des trucs super.
"""
texte

"             \nBonjour,\nPython c'est super.\nPython ca a l'air quand même un peu compliqué.\nMais bon, ca a l'air pratique.\nPeut-être que je pourrais m'en servir pour faire des trucs super.\n"

Code

print(texte)

             
Bonjour,
Python c'est super.
Python ca a l'air quand même un peu compliqué.
Mais bon, ca a l'air pratique.
Peut-être que je pourrais m'en servir pour faire des trucs super.

Code

# Some basic text preprocessing 
new_text = (
    texte
    .strip()
    .replace('\n', ' ')
    .replace(',', ' ')
    .replace('.', ' ')
    .replace("'", ' ')
)

print(new_text)
print('-' * 8)

words = new_text.split()
print(words)

Bonjour  Python c est super  Python ca a l air quand même un peu compliqué  Mais bon  ca a l air pratique  Peut-être que je pourrais m en servir pour faire des trucs super 
--------
['Bonjour', 'Python', 'c', 'est', 'super', 'Python', 'ca', 'a', 'l', 'air', 'quand', 'même', 'un', 'peu', 'compliqué', 'Mais', 'bon', 'ca', 'a', 'l', 'air', 'pratique', 'Peut-être', 'que', 'je', 'pourrais', 'm', 'en', 'servir', 'pour', 'faire', 'des', 'trucs', 'super']

Exercise

Count the number of occurences of all the words in words.

Output must be a dictionary containg word: count

Code

print(words)

['Bonjour', 'Python', 'c', 'est', 'super', 'Python', 'ca', 'a', 'l', 'air', 'quand', 'même', 'un', 'peu', 'compliqué', 'Mais', 'bon', 'ca', 'a', 'l', 'air', 'pratique', 'Peut-être', 'que', 'je', 'pourrais', 'm', 'en', 'servir', 'pour', 'faire', 'des', 'trucs', 'super']

Exercise

Compute the number of occurences AND the length of each word in words.

Output must be a dictionary containing word: (count, length)

I/O, reading and writing files

Next, put a text file miserables.txt in the folder containing this notebook. If it is not there, the next cell downloads it, if is it there, then we do nothing.

Code

import requests
import os

# The path containing your notebook
path_data = './'
# The name of the file
filename = 'miserables.txt'

if os.path.exists(os.path.join(path_data, filename)):
    print('The file %s already exists.' % os.path.join(path_data, filename))
else:
    url = 'https://stephanegaiffas.github.io/big_data_course/data/miserables.txt'
    r = requests.get(url)
    with open(os.path.join(path_data, filename), 'wb') as f:
        f.write(r.content)
    print('Downloaded file %s.' % os.path.join(path_data, filename))

Downloaded file ./miserables.txt.

Code

ls -alh

total 668K
drwxrwxr-x 10 boucheron boucheron 4,0K avril  3 15:07 ./
drwxrwxr-x  6 boucheron boucheron 4,0K avril  3 14:55 ../
drwxr-xr-x  3 boucheron boucheron 4,0K avril  3 15:02 0c19d4a9-62d0-4073-9add-d08089e30b7a/
-rw-rw-r--  1 boucheron boucheron  68K avril  3 15:07 checking_parquet_citibike.html
-rw-rw-r--  1 boucheron boucheron 3,7K avril  3 14:55 checking_parquet_citibike.qmd
drwxr-xr-x  2 boucheron boucheron 4,0K avril  3 15:02 csr.parquet/
drwxrwxr-x  3 boucheron boucheron 4,0K avril  3 15:07 .jupyter_cache/
drwxrwxr-x  3 boucheron boucheron 4,0K avril  3 15:01 __MACOSX/
-rw-rw-r--  1 boucheron boucheron  128 avril  3 14:55 _metadata.yml
-rw-rw-r--  1 boucheron boucheron 9,0K avril  3 15:07 miserables.txt
drwxrwxr-x  3 boucheron boucheron 4,0K avril  3 15:02 notebook01_python_files/
-rw-rw-r--  1 boucheron boucheron  71K avril  3 14:55 notebook01_python.qmd
-rw-rw-r--  1 boucheron boucheron 164K avril  3 15:07 notebook01_python.quarto_ipynb
drwxrwxr-x  3 boucheron boucheron 4,0K avril  3 15:02 notebook02_numpy_files/
-rw-rw-r--  1 boucheron boucheron  29K avril  3 14:55 notebook02_numpy.qmd
-rw-rw-r--  1 boucheron boucheron  22K avril  3 14:55 notebook03_pandas.qmd
-rw-rw-r--  1 boucheron boucheron  19K avril  3 14:55 notebook04_pandas_spark.qmd
-rw-rw-r--  1 boucheron boucheron 9,3K avril  3 14:55 notebook05_sparkrdd.qmd
-rw-rw-r--  1 boucheron boucheron  23K avril  3 14:55 notebook06_sparksql.qmd
-rw-rw-r--  1 boucheron boucheron  29K avril  3 14:55 notebook07_json-format.qmd
drwxrwxr-x  3 boucheron boucheron 4,0K avril  3 15:02 notebook08_webdata-II_files/
-rw-rw-r--  1 boucheron boucheron  25K avril  3 14:55 notebook08_webdata-II.qmd
-rw-rw-r--  1 boucheron boucheron  30K avril  3 14:55 notebook08_webdata.qmd
-rw-rw-r--  1 boucheron boucheron  36K avril  3 15:07 notebook-0.html
-rw-rw-r--  1 boucheron boucheron  153 avril  3 14:55 notebook-0.qmd
-rw-rw-r--  1 boucheron boucheron  755 avril  3 14:55 notebook10_graphx.qmd
-rw-rw-r--  1 boucheron boucheron  19K avril  3 14:55 notebook11_dive.qmd
-rw-rw-r--  1 boucheron boucheron 7,1K avril  3 14:55 notebook14.qmd
-rw-rw-r--  1 boucheron boucheron 2,0K avril  3 14:55 notebookxx_pg_pandas_spark.qmd
drwxrwxr-x  2 boucheron boucheron 4,0K avril  3 15:01 webdata.parquet/
-rw-rw-r--  1 boucheron boucheron 4,7K avril  3 14:55 xcitibike_spark.qmd
-rw-rw-r--  1 boucheron boucheron  15K avril  3 14:55 xciti_pandas.qmd

Code

# !rm -f miserables.txt

Code

os.path.join(path_data, filename)

'./miserables.txt'

In jupyter and ipython you can run terminal command lines using !

Let’s count number of lines and number of words with the wc command-line tool (linux or mac only, don’t ask me how on windows)

Code

# Lines count
!wc -l miserables.txt

79 miserables.txt

Code

# Word count
!wc -w miserables.txt

277 miserables.txt

Exercise

Count the number of occurences of each word in the text file miserables.txt. We use a open context and the Counter from before.

Contexts

A context in Python is something that we use with the with keyword.
It allows to deal automatically with the opening and the closing of the file.

Note the for loop:

for line in f:
    ...

You loop directly over the lines of the open file from within the open context

About `pickle`

You can save your computation with pickle.

pickle is a way of saving almost anything with Python.
It serializes the object in a binary format, and is usually the simplest and fastest way to go.

Code

import pickle as pkl

# Let's save it
with open('miserable_word_counts.pkl', 'wb') as f:
    pkl.dump(counter, f)

# And read it again
with open('miserable_word_counts.pkl', 'rb') as f:
    counter = pkl.load(f)

Code

counter.most_common(10)

[('{', 15),
 ('}', 15),
 ('0', 8),
 ('img', 6),
 ('margin:', 6),
 ('font', 6),
 ('logo', 6),
 ('only', 6),
 ('screen', 6),
 ('and', 6)]

Defining functions

You must use function to order and reuse code

Function definition

Function blocks must be indented as other control-flow blocks.

Code

def test():
    return 'in test function'

test()

'in test function'

Return statement

Functions can optionally return values. By default, functions return None.

The syntax to define a function:

the def keyword;
is followed by the function’s name, then
the arguments of the function are given between parentheses followed by a colon
the function body;
and return object for optionally returning values.

Code

None is None

True

Code

def f(x):
    return x + 10
f(20)

A function that returns several elements returns a tuple

Code

def f(x):
    return x + 1, x + 4

f(5)

(6, 9)

Code

type(f)

function

Code

f.truc = "bonjour"

Code

type(f(5))

tuple

Parameters

Mandatory parameters (positional arguments)

Code

def double_it(x):
    return x * 2

double_it(2)

Code

try:
    double_it()
except TypeError:
    print("TypeError: double_it() missing 1 required positional argument: 'x'")

TypeError: double_it() missing 1 required positional argument: 'x'

Optimal parameters

Code

def double_it(x=2):
    return x * 2

double_it()

Code

double_it(3)

Code

def f(x, y=2, z=10):
    print(x, '+', y, '+', z, '=', x + y + z)

Code

f(5)

5 + 2 + 10 = 17

Code

f(5, -2)

5 + -2 + 10 = 13

Code

f(5, -2, 8)

5 + -2 + 8 = 11

Code

f(z=5, x=-2, y=8)

-2 + 8 + 5 = 11

Argument unpacking and keyword argument unpacking

You can do stuff like this, using unpacking * notation

Code

a, *b, c = 1, 2, 3, 4, 5
a, b, c

(1, [2, 3, 4], 5)

Back to function f you can unpack a tuple as positional arguments

Code

tt = (1, 2, 3)
f(*tt)

1 + 2 + 3 = 6

Code

dd = {'y': 10, 'z': -5}

Code

f(3, **dd)

3 + 10 + -5 = 8

Code

def g(x, z, y, t=1, u=2):
    print(x, '+', y, '+', z, '+', t, '+', 
          u, '=', x + y + z + t + u)

Code

tt = (1, -4, 2)
dd = {'t': 10, 'u': -5}
g(*tt, **dd)

1 + 2 + -4 + 10 + -5 = 4

The prototype of all functions in `Python`

Code

def f(*args, **kwargs):
    print('args=', args)
    print('kwargs=', kwargs)

f(1, 2, 'truc', lastname='gaiffas', firstname='stephane')

args= (1, 2, 'truc')
kwargs= {'lastname': 'gaiffas', 'firstname': 'stephane'}

Uses * for argument unpacking and ** for keyword argument unpacking
The names args and kwargs are a convention, not mandatory
(but you are fired if you name these arguments otherwise)

Code

# How to get fired
def f(*aaa, **bbb):
    print('args=', aaa)
    print('kwargs=', bbb)
f(1, 2, 'truc', lastname='gaiffas', firstname='stephane')

args= (1, 2, 'truc')
kwargs= {'lastname': 'gaiffas', 'firstname': 'stephane'}

Remark. A function is a regular an object… you can add attributes on it !

Code

f.truc = 4

Code

f(1, 3)

args= (1, 3)
kwargs= {}

Code

f(3, -2, y='truc')

args= (3, -2)
kwargs= {'y': 'truc'}

Object-oriented programming (OOP)

Python supports object-oriented programming (OOP). The goals of OOP are:

to organize the code, and
to re-use code in similar contexts.

Here is a small example: we create a Student class, which is an object gathering several custom functions (called methods) and variables (called attributes).

Code

class Student(object):

    def __init__(self, name, birthyear, major='computer science'):
        self.name = name
        self.birthyear = birthyear
        self.major = major

    def __repr__(self):
        return "Student(name='{name}', birthyear={birthyear}, major='{major}')"\
                .format(name=self.name, birthyear=self.birthyear, major=self.major)

anna = Student('anna', 1987)
anna

Student(name='anna', birthyear=1987, major='computer science')

The __repr__ is what we call a ‘magic method’ in Python, that allows to display an object as a string easily. There is a very large number of such magic methods. There are used to implement interfaces

Exercise

Add a age method to the Student class that computes the age of the student. - You can (and should) use the datetime module. - Since we only know about the birth year, let’s assume that the day of the birth is January, 1st.

Properties

We can make methods look like attributes using properties, as shown below

Code

class Student(object):

    def __init__(self, name, birthyear, major='computer science'):
        self.name = name
        self.birthyear = birthyear
        self.major = major

    def __repr__(self):
        return "Student(name='{name}', birthyear={birthyear}, major='{major}')"\
                .format(name=self.name, birthyear=self.birthyear, major=self.major)

    @property
    def age(self):
        return datetime.now().year - self.birthyear
        
anna = Student('anna', 1987)
anna.age

Inheritance

A MasterStudent is a Student with a new extra mandatory internship attribute

Code

"%d" % 2

'2'

Code

x = 2

f"truc {x}"

'truc 2'

Code

class MasterStudent(Student):
    
    def __init__(self, name, age, internship, major='computer science'):
        # Student.__init__(self, name, age, major)
        Student.__init__(self, name, age, major)
        self.internship = internship

    def __repr__(self):
        return f"MasterStudent(name='{self.name}', internship={self.internship}, birthyear={self.birthyear}, major={self.major})"
    
MasterStudent('djalil', 22, 'pwc')

MasterStudent(name='djalil', internship=pwc, birthyear=22, major=computer science)

Code

class MasterStudent(Student):
    
    def __init__(self, name, age, internship, major='computer science'):
        # Student.__init__(self, name, age, major)
        Student.__init__(self, name, age, major)
        self.internship = internship

    def __repr__(self):
        return "MasterStudent(name='{name}', internship='{internship}'" \
               ", birthyear={birthyear}, major='{major}')"\
                .format(name=self.name, internship=self.internship,
                        birthyear=self.birthyear, major=self.major)
    
djalil = MasterStudent('djalil', 1996, 'pwc')

Code

djalil.__dict__

{'name': 'djalil',
 'birthyear': 1996,
 'major': 'computer science',
 'internship': 'pwc'}

Code

djalil.birthyear

Code

djalil.__dict__["birthyear"]

Monkey patching

Classes in Python are objects and actually dicts under the hood…
Therefore classes are objects that can be changed on the fly

Code

class Monkey(object):
    
    def __init__(self, name):
        self.name = name

    def describe(self):
        print("Old monkey %s" % self.name)

def patch(self):
    print("New monkey %s" % self.name)

monkey = Monkey("Baloo")
monkey.describe()

Monkey.describe = patch
monkey.describe()

Old monkey Baloo
New monkey Baloo

Code

monkeys = [Monkey("Baloo"), Monkey("Super singe")]


monkey_name = monkey.name

for i in range(1000):    
    monkey_name

Data classes

Since Python 3.7 you can use a dataclass for this

Does a lot of work for you (produces the __repr__ among many other things for you)

Code

from dataclasses import dataclass
from datetime import datetime 

@dataclass
class Student(object):
    name: str
    birthyear: int
    major: str = 'computer science'

    @property
    def age(self):
        return datetime.now().year - self.birthyear
        
anna = Student(name="anna", birthyear=1987)
anna

Student(name='anna', birthyear=1987, major='computer science')

Code

print(anna.age)

Most common mistakes

Let us wrap this up with the most common mistakes with Python

First, best way to learn and practice:

Start with the official tutorial https://docs.python.org/fr/3/tutorial/index.html
Look at https://python-3-for-scientists.readthedocs.io/en/latest/index.html
Continue with the documentation at https://docs.python.org/fr/3/index.html and work!

Using a mutable value as a default value

Code

def foo(bar=[]):
    bar.append('oops')
    return bar

print(foo())
print(foo())
print(foo())

print('-' * 8)
print(foo(['Ah ah']))
print(foo([]))

['oops']
['oops', 'oops']
['oops', 'oops', 'oops']
--------
['Ah ah', 'oops']
['oops']

Code

print(foo.__defaults__)
foo()
print(foo.__defaults__)

(['oops', 'oops', 'oops'],)
(['oops', 'oops', 'oops', 'oops'],)

The default value for a function argument is evaluated once, when the function is defined
the bar argument is initialized to its default (i.e., an empty list) only when foo() is first defined
successive calls to foo() (with no a bar argument specified) use the same list!

One should use instead

Code

def foo(bar=None):
    if bar is None:
        bar = []
    bar.append('oops')
    return bar

print(foo())
print(foo())
print(foo())
print(foo(['OK']))

['oops']
['oops']
['oops']
['OK', 'oops']

Code

print(foo.__defaults__)
foo()
print(foo.__defaults__)

(None,)
(None,)

No problem with immutable types

Code

def foo(bar=()):
    bar += ('oops',)
    return bar

print(foo())
print(foo())
print(foo())

('oops',)
('oops',)
('oops',)

Code

print(foo.__defaults__)

((),)

Class attributes VS object attributes

Code

class A(object):
    x = 1

    def __init__(self):
        self.y = 2

class B(A):
    def __init__(self):
        super().__init__()

class C(A):
    def __init__(self):
        super().__init__()

a, b, c = A(), B(), C()

Code

print(a.x, b.x, c.x)
print(a.y, b.y, c.y)

1 1 1
2 2 2

Code

a.y = 3
print(a.y, b.y, c.y)

3 2 2

Code

a.x = 3  # Adds a new attribute named x in object a
print(a.x, b.x, c.x)

3 1 1

Code

A.x = 4 # Changes the class attribute x of class A
print(a.x, b.x, c.x)

3 4 4

Attribute x is not an attribute of b nor c
It is also not a class attribute of classes B and C
So, it is is looked up in the base class A, which contains a class attribute x

Classes and objects contain a hidden dict to store their attributes, and are accessed following a method resolution order (MRO)

Code

a.__dict__, b.__dict__, c.__dict__

({'y': 3, 'x': 3}, {'y': 2}, {'y': 2})

Code

A.__dict__, B.__dict__, C.__dict__

(mappingproxy({'__module__': '__main__',
               'x': 4,
               '__init__': <function __main__.A.__init__(self)>,
               '__dict__': <attribute '__dict__' of 'A' objects>,
               '__weakref__': <attribute '__weakref__' of 'A' objects>,
               '__doc__': None}),
 mappingproxy({'__module__': '__main__',
               '__init__': <function __main__.B.__init__(self)>,
               '__doc__': None}),
 mappingproxy({'__module__': '__main__',
               '__init__': <function __main__.C.__init__(self)>,
               '__doc__': None}))

This can lead to nasty errors when using class attributes: learn more about this

Python scope rules

Code

try:
    ints += [4]
except NameError:
    print("NameError: name 'ints' is not defined")

NameError: name 'ints' is not defined

Code

ints = [1]

def foo1():
    ints.append(2)
    return ints

def foo2():
    ints += [2]
    return ints

Code

foo1()

[1, 2]

Code

try:    
    foo2()
except UnboundLocalError as inst:
    print(inst)

cannot access local variable 'ints' where it is not associated with a value

What the hell ?

An assignment to a variable in a scope assumes that the variable is local to that scope
and shadows any similarly named variable in any outer scope

ints += [2]

means

ints = ints + [2]

which is an assigment: ints must be defined in the local scope, but it is not, while

ints.append(2)

is not an assignemnt

Modify a `list` while iterating over it

Code

odd = lambda x: bool(x % 2)
numbers = list(range(10))

try:
  for i in range(len(numbers)):
      if odd(numbers[i]):
          del numbers[i]
except IndexError as inst:
    print(inst)

list index out of range

Typically an example where one should use a list comprehension

Code

[number for number in numbers if not odd(number)]

[0, 2, 4, 6, 8]

No docstrings

Accept to spend time to write clean docstrings (look at numpydoc style)

Code

def create_student(name, age, address, major='computer science'):
    """Add a student in the database
    
    Parameters
    ----------
    name: `str`
        Name of the student
    
    age: `int`
        Age of the student
    
    address: `str`
        Address of the student
    
    major: `str`, default='computer science'
        The major chosen by the student
    
    Returns
    -------
    output: `Student`
        A fresh student
    """
    pass

Code

create_student('Duduche', 28, 'Chalons')

Not using available methods and/or the simplest solution

Code

dd = {'stephane': 1234, 'gael': 4567, 'gontran': 891011}

# Bad
for key in dd.keys():
    print(key, dd[key])

print('-' * 8)

# Good
for key, value in dd.items():
    print(key, value)

stephane 1234
gael 4567
gontran 891011
--------
stephane 1234
gael 4567
gontran 891011

Code

colors = ['black', 'yellow', 'brown', 'red', 'pink']

# Bad
for i in range(len(colors)):
    print(i, colors[i])

print('-' * 8)

# Good
for i, color in enumerate(colors):
    print(i, color)

0 black
1 yellow
2 brown
3 red
4 pink
--------
0 black
1 yellow
2 brown
3 red
4 pink

Not using the standard library

While it’s always better than a hand-made solution

Code

list1 = [1, 2]
list2 = [3, 4]
list3 = [5, 6, 7]

for a in list1:
    for b in list2:
        for c in list3:
            print(a, b, c)

Code

from itertools import product

for a, b, c in product(list1, list2, list3):
    print(a, b, c)

Specific features of Python

Hello world

Basic types

Integers

Remark

Something cool

Floats

Boolean

Type conversion (casting)

Containers

Tuples

The Zen of Python easter’s egg

Lists

Slicing: obtaining sublists of regularly-spaced elements

Strings

Important

Extra stuff with strings

The in keyword

Brain-teasing

Dictionaries

Important remarks

Exercise 1

Exercise 2

Exercise 3

Sets

Assigments in Python is name binding

Everything is either mutable or immutable

About assigments

If you want a copy, to need to ask for one

Final remarks

Control flow and other stuff…

Blocks are delimited by indentation!

Anything can be understood as a boolean

While loops

for loop with range

For loops over iterable objects

Comprehensions

About functional programming

Exercise

Brain-teasing

Generators

Why generators ?

Generator expression

yield

A glimpse at the collections module

Exercise

Exercise

I/O, reading and writing files

Exercise

Contexts

About pickle

Defining functions

Function definition

Return statement

Parameters

Argument unpacking and keyword argument unpacking

The prototype of all functions in Python

Object-oriented programming (OOP)

Exercise

Properties

Inheritance

Monkey patching

Data classes

Most common mistakes

Using a mutable value as a default value

Class attributes VS object attributes

Python scope rules

What the hell ?

Modify a list while iterating over it

No docstrings

Not using available methods and/or the simplest solution

Not using the standard library

That’s it for now !

The `in` keyword

Assigments in `Python` is name binding

`for` loop with `range`

`yield`

A glimpse at the `collections` module

About `pickle`

The prototype of all functions in `Python`

Modify a `list` while iterating over it