# Introduction to Python

We introduce here the `python` language. Only the bare minimum necessary
for getting started with the data-science stack (a bunch of libraries
for data science). Python is a **programming language**, as are `C++`,
`java`, `fortran`, `javascript`, etc.

## Specific features of Python

-   an **interpreted** (as opposed to *compiled*) language. Contrary to
    e.g. `C++` or `fortran`, one does not compile Python code before
    executing it.

-   Used as a scripting language, by python `python script.py` in a
    terminal

-   But can be used also **interactively**: the jupyter notebook,
    iPython, etc.

-   A free software released under an **open-source** license: Python
    can be used and distributed free of charge, even for building
    commercial software.

-   **multi-platform**: Python is available for all major operating
    systems, Windows, Linux/Unix, MacOS X, most likely your mobile phone
    OS, etc.

-   A very readable language with clear non-verbose syntax

-   A language for which a **large amount of high-quality** packages are
    available for various applications, including web-frameworks and
    scientific computing

-   It has been one of the top **languages for data science** and
    **machine learning** for several years, because it is expressive and
    and easy to deploy

-   An object-oriented language

See https://www.python.org/about/ for more information about
distinguishing features of Python.

> **Python 2 or Python 3?**
>
> -   Simple answer: *don’t use Python 2, use Python 3*
> -   Python 2 is *mostly deprecated* and *has not been maintained* for
>     years
> -   You’ll end up hanged if you use Python 2
> -   If Python 2 is mandatory at your workplace, find another work

> **Jupyter or Quarto notebooks?**
>
> -   `quarto` is more git friendly than `jupyter`
>
> -   Enjoy authentic editors
>
> -   Go for `quarto`

# Hello world

-   In a `jupyter`/`quarto` notebook, you have an interactive
    interpreter.

-   You type in the cells, execute commands

In [1]:
print("Hi everybody!")

Hi everybody!


# Basic types

## Integers

In [2]:
1 + 42

43

In [3]:
type(1+1)

int

We can assign values to variables with `=`

In [4]:
a = (3 + 5 ** 2) % 4
a

0

## Remark

We don’t declare the type of a variable before assigning its value. In
C, conversely, one should write

``` c
int a = 4;
```

## Something cool

-   **Arbitrary large** integer arithmetics

In [5]:
17 ** 542

8004153099680695240677662228684856314409365427758266999205063931175132640587226837141154215226851187899067565063096026317140186260836873939218139105634817684999348008544433671366043519135008200013865245747791955240844192282274023825424476387832943666754140847806277355805648624376507618604963106833797989037967001806494232055319953368448928268857747779203073913941756270620192860844700087001827697624308861431399538404552468712313829522630577767817531374612262253499813723569981496051353450351968993644643291035336065584116155321928452618573467361004489993801594806505273806498684433633838323916674207622468268867047187858269410016150838175127772100983052010703525089

## Floats

There exists a floating point type that is created when the variable has
decimal values

In [6]:
c = 2.

In [7]:
type(c)

float

In [8]:
c = 2
type(c)

int

In [9]:
truc = 1 / 2
truc

0.5

In [10]:
1 // 2 + 1 % 2

1

In [11]:
type(truc)

float

## Boolean

Similarly, boolean types are created from a comparison

In [12]:
test = 3 > 4
test

False

In [13]:
type(test)

bool

In [14]:
False == (not True)

True

In [15]:
1.41 < 2.71 and 2.71 < 3.14

True

In [16]:
# It's equivalent to
1.41 < 2.71 < 3.14

True

## Type conversion (casting)

In [17]:
a = 1
type(a)

int

In [18]:
b = float(a)
type(b)

float

In [19]:
str(b)

'1.0'

In [20]:
bool(b)
# All non-zero, non empty objects are casted to boolean as True (more later)

True

In [21]:
bool(1-1)

False

# Containers

Python provides many efficient types of *containers* or *sequences*, in
which collections of objects can be stored.

The main ones are `list`, `tuple`, `set` and `dict` (but there are many
others…)

## Tuples

In [22]:
tt = 'truc', 3.14, "truc"
tt

('truc', 3.14, 'truc')

In [23]:
tt[0]

'truc'

You can’t change a tuple, we say that it’s *immutable*

In [24]:
try:
    tt[0] = 1
except TypeError:
    print(f"TypeError: 'tuple' object does not support item assignment")

TypeError: 'tuple' object does not support item assignment


Three ways of doing the same thing

In [25]:
# Method 1
tuple([1, 2, 3])

(1, 2, 3)

In [26]:
# Method 2
1, 2, 3

(1, 2, 3)

In [27]:
# Method 3
(1, 2, 3)

(1, 2, 3)

**Simpler is better in Python**, so usually you want to use Method 2.

In [28]:
toto = 1, 2, 3
toto

(1, 2, 3)

-   This is serious !

## The Zen of Python easter’s egg

In [29]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


## Lists

A list is an ordered collection of objects. These objects may have
different types. For example:

In [30]:
colors = ['red', 'blue', 'green', 'black', 'white']

In [31]:
colors[0]

'red'

In [32]:
type(colors)

list

*Indexing:* accessing individual objects contained in the list by their
position

In [33]:
colors[2]

'green'

In [34]:
colors[2] = 3.14
colors

['red', 'blue', 3.14, 'black', 'white']

> **Warning**
>
> For any *iterable* object in Python, indexing *starts at 0* (as in C),
> not at 1 (as in Fortran, R, or Matlab).

Counting from the end with negative indices:

In [35]:
colors[-1]

'white'

Index must remain in the range of the list

In [36]:
try:
    colors[10]
except IndexError:
    print(f"IndexError: 10 >= {len(colors)} ==len(colors), index out of range ")

IndexError: 10 >= 5 ==len(colors), index out of range 


In [37]:
colors

['red', 'blue', 3.14, 'black', 'white']

In [38]:
tt

('truc', 3.14, 'truc')

In [39]:
colors.append(tt)
colors

['red', 'blue', 3.14, 'black', 'white', ('truc', 3.14, 'truc')]

In [40]:
len(colors)

6

In [41]:
len(tt)

3

## Slicing: obtaining sublists of regularly-spaced elements

This work with anything iterable whenever it makes sense (`list`, `str`,
`tuple`, etc.)

In [42]:
colors

['red', 'blue', 3.14, 'black', 'white', ('truc', 3.14, 'truc')]

In [43]:
list(reversed(colors))

[('truc', 3.14, 'truc'), 'white', 'black', 3.14, 'blue', 'red']

In [44]:
colors[::-1]

[('truc', 3.14, 'truc'), 'white', 'black', 3.14, 'blue', 'red']

> **Slicing syntax:**
>
> `colors[start:stop:stride]`
>
> `start, stop, stride` are optional, with default values
> `0, len(sequence), 1`

l

In [45]:
print(slice(4))
print(slice(1,5))
print(slice(None,13,3))

slice(None, 4, None)
slice(1, 5, None)
slice(None, 13, 3)


In [46]:
sl = slice(1,5,2)
colors[sl]

['blue', 'black']

In [47]:
colors

['red', 'blue', 3.14, 'black', 'white', ('truc', 3.14, 'truc')]

In [48]:
colors[3:]

['black', 'white', ('truc', 3.14, 'truc')]

In [49]:
colors[:3]

['red', 'blue', 3.14]

In [50]:
colors[1::2]

['blue', 'black', ('truc', 3.14, 'truc')]

In [51]:
colors[::-1]

[('truc', 3.14, 'truc'), 'white', 'black', 3.14, 'blue', 'red']

## Strings

Different string syntaxes (simple, double or triple quotes):

In [52]:
s = 'tintin'
type(s)

str

In [53]:
s

'tintin'

In [54]:
s = """         Bonjour,
Je m'appelle Stephane.
Je vous souhaite une bonne journée.
Salut.       
"""
s

"         Bonjour,\nJe m'appelle Stephane.\nJe vous souhaite une bonne journée.\nSalut.       \n"

In [55]:
s.strip()

"Bonjour,\nJe m'appelle Stephane.\nJe vous souhaite une bonne journée.\nSalut."

In [56]:
print(s.strip())

Bonjour,
Je m'appelle Stephane.
Je vous souhaite une bonne journée.
Salut.


In [57]:
len(s)

91

In [58]:
# Casting to a list
list(s.strip()[:15])

['B', 'o', 'n', 'j', 'o', 'u', 'r', ',', '\n', 'J', 'e', ' ', 'm', "'", 'a']

In [59]:
# Arithmetics
print('Bonjour' * 2)
print('Hello' + ' all')

BonjourBonjour
Hello all


In [60]:
sss = 'A'
sss += 'bc'
sss += 'dE'
sss.lower()

'abcde'

In [61]:
ss = s.strip()
print(ss[:10] + ss[24:28])

Bonjour,
Jepha


In [62]:
s.strip()

"Bonjour,\nJe m'appelle Stephane.\nJe vous souhaite une bonne journée.\nSalut."

In [63]:
s.strip().split('\n')

['Bonjour,',
 "Je m'appelle Stephane.",
 'Je vous souhaite une bonne journée.',
 'Salut.']

In [64]:
s[::3]

'   BjrJmpl ea.eo ui eoeon.at  \n'

In [65]:
s[3:10]

'      B'

In [66]:
" ".join(['Il', 'fait', 'super', 'beau', "aujourd'hui"])

"Il fait super beau aujourd'hui"

Chaining method calls is the basic of pipeline building.

In [67]:
( 
    " ".join(['Il', 'fait', 'super', 'beau', "aujourd'hui"])
       .title()
       .replace(' ', '')
       .replace("'","")
)

'IlFaitSuperBeauAujourdHui'

### Important

A string is *immutable* !!

In [68]:
s = 'I am an immutable guy'

In [69]:
try:  
    s[2] = 's'
except TypeError:
    print(f"Strings are immutable! s is still '{s}'")

Strings are immutable! s is still 'I am an immutable guy'


In [70]:
id(s)

136377566125744

In [71]:
print(s + ', for sure')
id(s), id(s + ' for sure')

I am an immutable guy, for sure


(136377566125744, 136377566246624)

### Extra stuff with strings

In [72]:
'square of 2 is ' + str(2 ** 2)

'square of 2 is 4'

In [73]:
'square of 2 is %d' % 2 ** 2

'square of 2 is 4'

In [74]:
'square of 2 is {}'.format(2 ** 2)

'square of 2 is 4'

In [75]:
'square of 2 is {square}'.format(square=2 ** 2)

'square of 2 is 4'

In [76]:
# And since Python 3.6 you can use an `f-string`
number = 2
square = number ** 2

f'square of {number} is {square}'

'square of 2 is 4'

### The `in` keyword

You can use the `in` keyword with any container, whenever it makes sense

In [77]:
print(s)
print('Salut' in s)

I am an immutable guy
False


In [78]:
print(tt)
print('truc' in tt)

('truc', 3.14, 'truc')
True


In [79]:
print(colors)
print('truc' in colors)

['red', 'blue', 3.14, 'black', 'white', ('truc', 3.14, 'truc')]
False


In [80]:
('truc', 3.14, 'truc') in colors

True

> **Warning**
>
> Strings are not bytes. Have a look at chapter 4 *Unicode Text versus
> Bytes* in [Fluent
> Python](https://www.oreilly.com/library/view/fluent-python-2nd/9781492056348/)

### Brain-teasing

Explain this weird behaviour:

In [81]:
5 in [1, 2, 3, 4] == False

False

In [82]:
[1, 2, 3, 4] == False

False

In [83]:
5 not in [1, 2, 3, 4]

True

In [84]:
(5 in [1, 2, 3, 4]) == False

True

In [85]:
# ANSWER.
# This is a chained comparison. We have seen that 
1 < 2 < 3
# is equivalent to
(1 < 2) and (2 < 3)
# so that
5 in [1, 2, 3, 4] == False
# is equivalent to
(5 in [1, 2, 3, 4]) and ([1, 2, 3, 4] == False)

False

In [86]:
(5 in [1, 2, 3, 4])

False

In [87]:
([1, 2, 3, 4] == False)

False

## Dictionaries

-   A dictionary is basically an efficient table that **maps keys to
    values**.
-   The **MOST** important container in Python.
-   Many things are actually a `dict` under the hood in `Python`

In [88]:
tel = {'emmanuelle': 5752, 'sebastian': 5578}
print(tel)
print(type(tel))

{'emmanuelle': 5752, 'sebastian': 5578}
<class 'dict'>


In [89]:
tel['emmanuelle'], tel['sebastian']

(5752, 5578)

In [90]:
tel['francis'] = '5919'
tel

{'emmanuelle': 5752, 'sebastian': 5578, 'francis': '5919'}

In [91]:
len(tel)

3

### Important remarks

-   Keys can be of different types
-   A key must be of **immutable** type

In [92]:
tel[7162453] = [1, 3, 2]
tel[3.14] = 'bidule'
tel[('jaouad', 2)] = 1234
tel

{'emmanuelle': 5752,
 'sebastian': 5578,
 'francis': '5919',
 7162453: [1, 3, 2],
 3.14: 'bidule',
 ('jaouad', 2): 1234}

In [93]:
try:
    sorted(tel)
except TypeError:
    print("TypeError: '<' not supported between instances of 'int' and 'str'")    

TypeError: '<' not supported between instances of 'int' and 'str'


In [94]:
# A list is mutable and not hashable
try:
    tel[['jaouad']] = '5678'
except TypeError:
    print("TypeError: unhashable type: 'list'")

TypeError: unhashable type: 'list'


In [95]:
try:
    tel[2]
except KeyError:
    print("KeyError: 2")

KeyError: 2


In [96]:
tel = {'emmanuelle': 5752, 'sebastian' : 5578, 'jaouad' : 1234}
print(tel.keys())
print(tel.values())
print(tel.items())

dict_keys(['emmanuelle', 'sebastian', 'jaouad'])
dict_values([5752, 5578, 1234])
dict_items([('emmanuelle', 5752), ('sebastian', 5578), ('jaouad', 1234)])


In [97]:
list(tel.keys())[2]

'jaouad'

In [98]:
tel.values().mapping

mappingproxy({'emmanuelle': 5752, 'sebastian': 5578, 'jaouad': 1234})

In [99]:
type(tel.keys())

dict_keys

In [100]:
'rémi' in tel

False

In [101]:
list(tel)

['emmanuelle', 'sebastian', 'jaouad']

In [102]:
'rémi' in tel.keys()

False

You can swap values like this

In [103]:
print(tel)
tel['emmanuelle'], tel['sebastian'] = tel['sebastian'], tel['emmanuelle']
print(tel)

{'emmanuelle': 5752, 'sebastian': 5578, 'jaouad': 1234}
{'emmanuelle': 5578, 'sebastian': 5752, 'jaouad': 1234}


In [104]:
# It works, since
a, b = 2.71, 3.14
a, b = b, a
a, b

(3.14, 2.71)

### Exercise 1

Get keys of `tel` sorted by decreasing order

In [105]:
tel = {'emmanuelle': 5752, 'sebastian' : 5578, 'jaouad' : 1234}

### Exercise 2

Get keys of `tel` sorted by increasing *values*

In [106]:
tel = {'emmanuelle': 5752, 'sebastian' : 5578, 'jaouad' : 1234}

### Exercise 3

Obtain a sorted-by-key version of `tel`

In [107]:
tel = {'emmanuelle': 5752, 'sebastian' : 5578, 'jaouad' : 1234}

## Sets

A set is an unordered container, containing unique elements

In [108]:
ss = {1, 2, 2, 2, 3, 3, 'tintin', 'tintin', 'toto'}
ss

{1, 2, 3, 'tintin', 'toto'}

In [109]:
s = 'truc truc bidule truc'
set(s)

{' ', 'b', 'c', 'd', 'e', 'i', 'l', 'r', 't', 'u'}

In [110]:
set(list(s))

{' ', 'b', 'c', 'd', 'e', 'i', 'l', 'r', 't', 'u'}

In [111]:
{1, 5, 2, 1, 1}.union({1, 2, 3})

{1, 2, 3, 5}

In [112]:
set((1, 5, 3, 2))

{1, 2, 3, 5}

In [113]:
set([1, 5, 2, 1, 1]).intersection(set([1, 2, 3]))

{1, 2}

In [114]:
ss.add('tintin')
ss

{1, 2, 3, 'tintin', 'toto'}

In [115]:
ss.difference(range(6))

{'tintin', 'toto'}

You can combine all containers together

In [116]:
dd = {
    'truc': [1, 2, 3], 
    5: (1, 4, 2),
    (1, 3): {'hello', 'world'}
}
dd

{'truc': [1, 2, 3], 5: (1, 4, 2), (1, 3): {'hello', 'world'}}

# Assigments in `Python` is name binding

## Everything is either mutable or immutable

In [117]:
ss = {1, 2, 3}
sss = ss
sss, ss

({1, 2, 3}, {1, 2, 3})

In [118]:
id(ss), id(sss)

(136377567624128, 136377567624128)

In [119]:
sss.add("Truc")

**Question.** What is in `ss` ?

In [120]:
ss, sss

({1, 2, 3, 'Truc'}, {1, 2, 3, 'Truc'})

`ss` and `sss` are names for the same object

In [121]:
id(ss), id(sss)

(136377567624128, 136377567624128)

In [122]:
ss is sss

True

In [123]:
help('is')

Comparisons
***********

Unlike C, all comparison operations in Python have the same priority,
which is lower than that of any arithmetic, shifting or bitwise
operation.  Also unlike C, expressions like "a < b < c" have the
interpretation that is conventional in mathematics:

   comparison    ::= or_expr (comp_operator or_expr)*
   comp_operator ::= "<" | ">" | "==" | ">=" | "<=" | "!="
                     | "is" ["not"] | ["not"] "in"

Comparisons yield boolean values: "True" or "False". Custom *rich
comparison methods* may return non-boolean values. In this case Python
will call "bool()" on such value in boolean contexts.

Comparisons can be chained arbitrarily, e.g., "x < y <= z" is
equivalent to "x < y and y <= z", except that "y" is evaluated only
once (but in both cases "z" is not evaluated at all when "x < y" is
found to be false).

Formally, if *a*, *b*, *c*, …, *y*, *z* are expressions and *op1*,
*op2*, …, *opN* are comparison operators, then "a op1 b op2 c ... y
opN z" is eq

## About assigments

-   Python never copies an object
-   Unless you ask him to

When you code

``` python
x = [1, 2, 3]
y = x
```

you just - **bind** the variable name `x` to a list `[1, 2, 3]` - give
another name `y` to the same object

**Important remarks**

-   **Everything** is an object in Python
-   Either **immutable** or **mutable**

In [124]:
id(1), id(1+1), id(2)

(11753896, 11753928, 11753928)

**A `list` is mutable**

In [125]:
x = [1, 2, 3]
print(id(x), x)
x[0] += 42; x.append(3.14)
print(id(x), x)

136377566846400 [1, 2, 3]
136377566846400 [43, 2, 3, 3.14]


**A `str` is immutable**

In order to “change” an **immutable** object, Python creates a new one

In [126]:
s = 'to'
print(id(s), s)
s += 'to'
print(id(s), s)

136377650748880 to
136377566925216 toto


**Once again, a `list` is mutable**

In [127]:
super_list = [3.14, (1, 2, 3), 'tintin']
other_list = super_list
id(other_list), id(super_list)

(136377566776128, 136377566776128)

-   `other_list` and `super_list` are the same list
-   If you change one, you change the other.
-   `id` returns the identity of an object. Two objects with the same
    idendity are the same (not only the same type, but the same
    instance)

In [128]:
other_list[1] = 'youps'
other_list, super_list

([3.14, 'youps', 'tintin'], [3.14, 'youps', 'tintin'])

In [129]:
id(super_list), id(other_list)

(136377566776128, 136377566776128)

## If you want a copy, to need to ask for one

In [130]:
other_list = super_list.copy()
id(other_list), id(super_list)

(136377566906240, 136377566776128)

In [131]:
other_list[1] = 'copy'
other_list, super_list

([3.14, 'copy', 'tintin'], [3.14, 'youps', 'tintin'])

Only `other_list` is modified.

But… what if you have a `list` of `list` ? (or a mutable object
containing mutable objects)

In [132]:
l1, l2 = [1, 2, 3], [4, 5, 6]
list_list = [l1, l2]
list_list

[[1, 2, 3], [4, 5, 6]]

In [133]:
id(list_list), id(list_list[0]), id(l1), list_list[0] is l1

(136377566911424, 136377566906048, 136377566906048, True)

Let’s make a copy of `list_list`

In [134]:
copy_list = list_list.copy()
copy_list.append('super')
list_list, copy_list

([[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6], 'super'])

In [135]:
id(list_list[0]), id(copy_list[0])

(136377566906048, 136377566906048)

OK, only `copy_list` is modified, as expected

But now…

In [136]:
copy_list[0][1] = 'oups'
copy_list, list_list

([[1, 'oups', 3], [4, 5, 6], 'super'], [[1, 'oups', 3], [4, 5, 6]])

**Question.** What happened ?!?

-   The `list_list` object is copied
-   But NOT what it’s containing !
-   By default `copy` does a *shallow* copy, not a *deep* copy
-   It does not build copies of what is contained
-   If you want to copy an object and all that is contained in it, you
    need to use `deepcopy`.

In [137]:
from copy import deepcopy

copy_list = deepcopy(list_list)
copy_list[0][1] = 'incredible !'
list_list, copy_list

([[1, 'oups', 3], [4, 5, 6]], [[1, 'incredible !', 3], [4, 5, 6]])

## Final remarks

In [138]:
tt = ([1, 2, 3], [4, 5, 6])
print(id(tt), tt)
print(list(map(id, tt)))

136377567519872 ([1, 2, 3], [4, 5, 6])
[136377566989952, 136377566994688]


In [139]:
tt[0][1] = '42'
print(id(tt), tt)
print(list(map(id, tt)))

136377567519872 ([1, '42', 3], [4, 5, 6])
[136377566989952, 136377566994688]


In [140]:
s = [1, 2, 3]

In [141]:
s2 = s

In [142]:
s2 is s

True

In [143]:
id(s2), id(s)

(136377566905920, 136377566905920)

# Control flow and other stuff…

Namely tests, loops, again booleans, etc.

In [144]:
if 2 ** 2 == 5:
    print('Obvious')
else:
    print('YES')
print('toujours')

YES
toujours


## Blocks are delimited by indentation!

In [145]:
a = 3
if a > 0:
    if a == 1:
        print(1)
    elif a == 2:
        print(2)
elif a == 2:
    print(2)
elif a == 3:
    print(3)
else:
    print(a)

## Anything can be understood as a boolean

For example, don’t do this to test if a list is empty

In [146]:
l2 = ['hello', 'everybody']

if len(l2) > 0:
    print(l2[0])

hello


but this

In [147]:
if l2:
    print(l2[0])

hello


**Some poetry**

-   An empty `dict` is `False`
-   An empty `string` is `False`
-   An empty `list` is `False`
-   An empty `tuple` is `False`
-   An empty `set` is `False`
-   `0` is `False`
-   `.0` is `False`
-   etc…
-   everything else is `True`

## While loops

In [148]:
a = 10
b = 1
while b < a:
    b = b + 1
    print(b)

2
3
4
5
6
7
8
9
10


Compute the decimals of Pi using the Wallis formula

$$
\pi = 2 \prod_{i=1}^{100} \frac{4i^2}{4i^2 - 1}
$$

In [149]:
pi = 2
eps = 1e-10
dif = 2 * eps
i = 1
while dif > eps:
    pi, i, old_pi = pi * 4 * i ** 2 / (4 * i ** 2 - 1), i + 1, pi
    dif = pi - old_pi

In [150]:
pi

3.1415837914138556

In [151]:
from math import pi

pi

3.141592653589793

## `for` loop with `range`

-   Iteration with an index, with a list, with many things !
-   `range` has the same parameters as with slicing `start:end:stride`,
    all parameters being optional

In [152]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


In [153]:
for i in range(4):
    print(i + 1)
print('-')

for i in range(1, 5):
    print(i)
print('-')

for i in range(1, 10, 3):
    print(i)

1
2
3
4
-
1
2
3
4
-
1
4
7


**Something for nerds**. You can use `else` in a `for` loop

In [154]:
names = ['stephane', 'mokhtar', 'jaouad', 'simon', 'yiyang']

for name in names:
    if name.startswith('u'):
        print(name)
        break
else:
    print('Not found.')

Not found.


In [155]:
names = ['stephane', 'mokhtar', 'jaouad', 'ulysse', 'simon', 'yiyang']

for name in names:
    if name.startswith('u'):
        print(name)
        break
else:
    print('Not found.')

ulysse


## For loops over iterable objects

You can iterate using `for` over any container: `list`, `tuple`, `dict`,
`str`, `set` among others…

In [156]:
colors = ['red', 'blue', 'black', 'white']
peoples = ['stephane', 'jaouad', 'mokhtar', 'yiyang', 'rémi']

In [157]:
# This is stupid
for i in range(len(colors)):
    print(colors[i])
    
# This is better
for color in colors:
    print(color)

red
blue
black
white
red
blue
black
white


To iterate over several sequences at the same time, use `zip`

In [158]:
for color, people in zip(colors, peoples):
    print(color, people)

red stephane
blue jaouad
black mokhtar
white yiyang


In [159]:
l = ["Bonjour", {'francis': 5214, 'stephane': 5123}, ('truc', 3)]
for e in l:
    print(e, len(e))

Bonjour 7
{'francis': 5214, 'stephane': 5123} 2
('truc', 3) 2


**Loop over a `str`**

In [160]:
s = 'Bonjour'
for c in s:
    print(c)

B
o
n
j
o
u
r


**Loop over a `dict`**

In [161]:
dd = {(1, 3): {'hello', 'world'}, 'truc': [1, 2, 3], 5: (1, 4, 2)}

# Default is to loop over keys
for key in dd:
    print(key)

(1, 3)
truc
5


In [162]:
# Loop over values
for e in dd.values():
    print(e)

{'hello', 'world'}
[1, 2, 3]
(1, 4, 2)


In [163]:
# Loop over items (key, value) pairs
for key, val in dd.items():
    print(key, val)

(1, 3) {'hello', 'world'}
truc [1, 2, 3]
5 (1, 4, 2)


In [164]:
for t in dd.items():
    print(t)

((1, 3), {'hello', 'world'})
('truc', [1, 2, 3])
(5, (1, 4, 2))


## Comprehensions

You can construct a `list`, `dict`, `set` and others using the
**comprehension** syntax

**`list` comprehension**

In [165]:
print(colors)
print(peoples)

['red', 'blue', 'black', 'white']
['stephane', 'jaouad', 'mokhtar', 'yiyang', 'rémi']


In [166]:
l = []
for p, c in zip(peoples, colors):
    if len(c)<=4 :
        l.append(p)
print(l)

['stephane', 'jaouad']


In [167]:
# The list of people with favorite color that has no more than 4 characters

[people for color, people in zip(colors, peoples) if len(color) <= 4]

['stephane', 'jaouad']

**`dict` comprehension**

In [168]:
{people: color for color, people in zip(colors, peoples) if len(color) <= 4}

{'stephane': 'red', 'jaouad': 'blue'}

In [169]:
# Allows to build a dict from two lists (for keys and values)
{key: value for (key, value) in zip(peoples, colors)}

{'stephane': 'red', 'jaouad': 'blue', 'mokhtar': 'black', 'yiyang': 'white'}

In [170]:
# But it's simpler (so better) to use
dict(zip(peoples, colors))

{'stephane': 'red', 'jaouad': 'blue', 'mokhtar': 'black', 'yiyang': 'white'}

Something very convenient is `enumerate`

In [171]:
for i, color in enumerate(colors):
    print(i, color)

0 red
1 blue
2 black
3 white


In [172]:
list(enumerate(colors))

[(0, 'red'), (1, 'blue'), (2, 'black'), (3, 'white')]

In [173]:
dict(enumerate(s))

{0: 'B', 1: 'o', 2: 'n', 3: 'j', 4: 'o', 5: 'u', 6: 'r'}

In [174]:
print(dict(enumerate(s)))

{0: 'B', 1: 'o', 2: 'n', 3: 'j', 4: 'o', 5: 'u', 6: 'r'}


In [175]:
s = 'Hey everyone'
{c: i for i, c in enumerate(s)}

{'H': 0, 'e': 11, 'y': 8, ' ': 3, 'v': 5, 'r': 7, 'o': 9, 'n': 10}

## About functional programming

We can use `lambda` to define **anonymous** functions, and use them in
the `map` and `reduce` functions

In [176]:
square = lambda x: x ** 2
square(2)

4

In [177]:
type(square)

function

In [178]:
dir(square)

['__annotations__',
 '__builtins__',
 '__call__',
 '__class__',
 '__closure__',
 '__code__',
 '__defaults__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__get__',
 '__getattribute__',
 '__getstate__',
 '__globals__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__kwdefaults__',
 '__le__',
 '__lt__',
 '__module__',
 '__name__',
 '__ne__',
 '__new__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__type_params__']

In [179]:
s = "a"

In [180]:
try:
    square("a")
except TypeError:
    print("TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'")

TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'


In [181]:
sum2 = lambda a, b: a + b
print(sum2('Hello', ' world'))
print(sum2(1, 2))

Hello world
3


Intended for short and one-line function.

More complex functions use `def` (see below)

## Exercise

Print the squares of even numbers between 0 et 15

1.  Using a list comprehension as before
2.  Using `map`

## Brain-teasing

What is the output of

In [182]:
reduce(lambda a, b: a + b[0] * b[1], enumerate('abcde'), 'A')

NameError: name 'reduce' is not defined

# Generators

In [None]:
import sys
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
plt.figure(figsize=(6, 6))
plt.plot([sys.getsizeof(list(range(i))) for i in range(10000)], lw=3)
plt.plot([sys.getsizeof(range(i)) for i in range(10000)], lw=3)
plt.xlabel('Number of elements (value of i)', fontsize=14)
plt.ylabel('Size (in bytes)', fontsize=14)
_ = plt.legend(['list(range(i))', 'range(i)'], fontsize=16)

## Why generators ?

The memory used by `range(i)` does not scale linearly with `i`

What is happening ?

-   `range(n)` does not allocate a list of `n` elements !
-   It **generates on the fly** the list of required integers
-   We say that such an object behaves like a **generator** in `Python`
-   Many things in the `Python` standard library behaves like this

**Warning.** Getting the real memory footprint of a `Python` object is
difficult. Note that `sizeof` calls the `__sizeof__` method of `r`,
which does not give in general the actual memory used by an object. But
nevermind here.

The following computation has no memory footprint:

In [None]:
sum(range(10**8))

In [None]:
map(lambda x: x**2, range(10**7))

`map` does not return a `list` for the same reason

In [None]:
sum(map(lambda x: x**2, range(10**6)))

## Generator expression

Namely generators defined through comprehensions. Just replace `[]` by
`()` in the comprehension.

A generator can be iterated on only **once**

In [None]:
range(10)

In [None]:
carres = (i**2 for i in range(10))

In [None]:
carres

In [None]:
for c in carres:
    print(c)

In [None]:
for i in range(4):
    for j in range(3):
        print(i, j)

In [None]:
from itertools import product

for t in product(range(4), range(3)):
    print(t)

In [None]:
from itertools import product

gene = (i + j for i, j in product(range(3), range(3)))
gene

In [None]:
print(list(gene))
print(list(gene))

## `yield`

Something very powerful

In [None]:
def startswith(words, letter):
    for word in words:
        if word.startswith(letter):
            yield word

In [None]:
words = [
    'Python', "is", 'awesome', 'in', 'particular', 'generators', 
    'are', 'really', 'cool'
]

In [None]:
list(word for word in words if word.startswith("a"))

In [None]:
a = 2

In [None]:
float(a)

But also with a `for` loop

In [None]:
for word in startswith(words, letter='a'):
    print(word)

In [None]:
it = startswith(words, letter='a')

In [None]:
type(it)

In [None]:
next(it)

In [None]:
next(it)

In [None]:
try:
    next(it)
except StopIteration:
    print("StopIteration exception!")

# A glimpse at the `collections` module

(This is where the good stuff hides)

In [None]:
texte = """             
Bonjour,
Python c'est super.
Python ca a l'air quand même un peu compliqué.
Mais bon, ca a l'air pratique.
Peut-être que je pourrais m'en servir pour faire des trucs super.
"""
texte

In [None]:
print(texte)

In [None]:
# Some basic text preprocessing 
new_text = (
    texte
    .strip()
    .replace('\n', ' ')
    .replace(',', ' ')
    .replace('.', ' ')
    .replace("'", ' ')
)

print(new_text)
print('-' * 8)

words = new_text.split()
print(words)

## Exercise

Count the number of occurences of all the words in `words`.

Output must be a dictionary containg `word: count`

In [None]:
print(words)

## Exercise

Compute the number of occurences AND the length of each word in `words`.

Output must be a dictionary containing `word: (count, length)`

# I/O, reading and writing files

Next, put a text file `miserables.txt` in the folder containing this
notebook. If it is not there, the next cell downloads it, if is it
there, then we do nothing.

In [None]:
import requests
import os

# The path containing your notebook
path_data = './'
# The name of the file
filename = 'miserables.txt'

if os.path.exists(os.path.join(path_data, filename)):
    print('The file %s already exists.' % os.path.join(path_data, filename))
else:
    url = 'https://stephanegaiffas.github.io/big_data_course/data/miserables.txt'
    r = requests.get(url)
    with open(os.path.join(path_data, filename), 'wb') as f:
        f.write(r.content)
    print('Downloaded file %s.' % os.path.join(path_data, filename))

In [None]:
ls -alh

In [None]:
# !rm -f miserables.txt

In [None]:
os.path.join(path_data, filename)

In `jupyter` and `ipython` you can run terminal command lines using `!`

Let’s count number of lines and number of words with the `wc`
command-line tool (linux or mac only, don’t ask me how on windows)

In [None]:
# Lines count
!wc -l miserables.txt

In [None]:
# Word count
!wc -w miserables.txt

## Exercise

Count the number of occurences of each word in the text file
`miserables.txt`. We use a `open` *context* and the `Counter` from
before.

## Contexts

-   A *context* in Python is something that we use with the `with`
    keyword.

-   It allows to deal automatically with the opening and the closing of
    the file.

Note the for loop:

``` python
for line in f:
    ...
```

You loop directly over the lines of the open file from **within** the
`open` context

## About `pickle`

You can save your computation with `pickle`.

-   `pickle` is a way of saving **almost anything** with Python.
-   It serializes the object in a binary format, and is usually the
    simplest and fastest way to go.

In [None]:
import pickle as pkl

# Let's save it
with open('miserable_word_counts.pkl', 'wb') as f:
    pkl.dump(counter, f)

# And read it again
with open('miserable_word_counts.pkl', 'rb') as f:
    counter = pkl.load(f)

In [None]:
counter.most_common(10)

# Defining functions

You **must** use function to order and reuse code

## Function definition

Function blocks must be indented as other control-flow blocks.

In [None]:
def test():
    return 'in test function'

test()

## Return statement

Functions can *optionally* return values. By default, functions return
`None`.

The syntax to define a function:

-   the `def` keyword;
-   is followed by the function’s **name**, then
-   the arguments of the function are given between parentheses followed
    by a colon
-   the function body;
-   and `return object` for optionally returning values.

In [None]:
None is None

In [None]:
def f(x):
    return x + 10
f(20)

A function that returns several elements returns a `tuple`

In [None]:
def f(x):
    return x + 1, x + 4

f(5)

In [None]:
type(f)

In [None]:
f.truc = "bonjour"

In [None]:
type(f(5))

## Parameters

Mandatory parameters (positional arguments)

In [None]:
def double_it(x):
    return x * 2

double_it(2)

In [None]:
try:
    double_it()
except TypeError:
    print("TypeError: double_it() missing 1 required positional argument: 'x'")

Optimal parameters

In [None]:
def double_it(x=2):
    return x * 2

double_it()

In [None]:
double_it(3)

In [None]:
def f(x, y=2, z=10):
    print(x, '+', y, '+', z, '=', x + y + z)

In [None]:
f(5)

In [None]:
f(5, -2)

In [None]:
f(5, -2, 8)

In [None]:
f(z=5, x=-2, y=8)

## Argument unpacking and keyword argument unpacking

You can do stuff like this, using unpacking `*` notation

In [None]:
a, *b, c = 1, 2, 3, 4, 5
a, b, c

Back to function `f` you can unpack a `tuple` as positional arguments

In [None]:
tt = (1, 2, 3)
f(*tt)

In [None]:
dd = {'y': 10, 'z': -5}

In [None]:
f(3, **dd)

In [None]:
def g(x, z, y, t=1, u=2):
    print(x, '+', y, '+', z, '+', t, '+', 
          u, '=', x + y + z + t + u)

In [None]:
tt = (1, -4, 2)
dd = {'t': 10, 'u': -5}
g(*tt, **dd)

## The prototype of all functions in `Python`

In [None]:
def f(*args, **kwargs):
    print('args=', args)
    print('kwargs=', kwargs)

f(1, 2, 'truc', lastname='gaiffas', firstname='stephane')

-   Uses `*` for **argument unpacking** and `**` for **keyword argument
    unpacking**
-   The names `args` and `kwargs` are a convention, not mandatory
-   (but you are fired if you name these arguments otherwise)

In [None]:
# How to get fired
def f(*aaa, **bbb):
    print('args=', aaa)
    print('kwargs=', bbb)
f(1, 2, 'truc', lastname='gaiffas', firstname='stephane')    

**Remark**. A function is a regular an object… you can add attributes on
it !

In [None]:
f.truc = 4

In [None]:
f(1, 3)

In [None]:
f(3, -2, y='truc')

# Object-oriented programming (OOP)

Python supports object-oriented programming (OOP). The goals of OOP are:

-   to organize the code, and
-   to re-use code in similar contexts.

Here is a small example: we create a `Student` class, which is an object
gathering several custom functions (called *methods*) and variables
(called *attributes*).

In [None]:
class Student(object):

    def __init__(self, name, birthyear, major='computer science'):
        self.name = name
        self.birthyear = birthyear
        self.major = major

    def __repr__(self):
        return "Student(name='{name}', birthyear={birthyear}, major='{major}')"\
                .format(name=self.name, birthyear=self.birthyear, major=self.major)

anna = Student('anna', 1987)
anna

The `__repr__` is what we call a ‘magic method’ in Python, that allows
to display an object as a string easily. There is a very large number of
such magic methods. There are used to implement **interfaces**

## Exercise

Add a `age` method to the Student class that computes the age of the
student. - You can (and should) use the `datetime` module. - Since we
only know about the birth year, let’s assume that the day of the birth
is January, 1st.

## Properties

We can make methods look like attributes using **properties**, as shown
below

In [None]:
class Student(object):

    def __init__(self, name, birthyear, major='computer science'):
        self.name = name
        self.birthyear = birthyear
        self.major = major

    def __repr__(self):
        return "Student(name='{name}', birthyear={birthyear}, major='{major}')"\
                .format(name=self.name, birthyear=self.birthyear, major=self.major)

    @property
    def age(self):
        return datetime.now().year - self.birthyear
        
anna = Student('anna', 1987)
anna.age

## Inheritance

A `MasterStudent` is a `Student` with a new extra mandatory `internship`
attribute

In [None]:
"%d" % 2

In [None]:
x = 2

f"truc {x}"

In [None]:
class MasterStudent(Student):
    
    def __init__(self, name, age, internship, major='computer science'):
        # Student.__init__(self, name, age, major)
        Student.__init__(self, name, age, major)
        self.internship = internship

    def __repr__(self):
        return f"MasterStudent(name='{self.name}', internship={self.internship}, birthyear={self.birthyear}, major={self.major})"
    
MasterStudent('djalil', 22, 'pwc')

In [None]:
class MasterStudent(Student):
    
    def __init__(self, name, age, internship, major='computer science'):
        # Student.__init__(self, name, age, major)
        Student.__init__(self, name, age, major)
        self.internship = internship

    def __repr__(self):
        return "MasterStudent(name='{name}', internship='{internship}'" \
               ", birthyear={birthyear}, major='{major}')"\
                .format(name=self.name, internship=self.internship,
                        birthyear=self.birthyear, major=self.major)
    
djalil = MasterStudent('djalil', 1996, 'pwc')

In [None]:
djalil.__dict__

In [None]:
djalil.birthyear

In [None]:
djalil.__dict__["birthyear"]

## Monkey patching

-   Classes in `Python` are `objects` and actually `dict`s under the
    hood…
-   Therefore classes are objects that can be changed on the fly

In [None]:
class Monkey(object):
    
    def __init__(self, name):
        self.name = name

    def describe(self):
        print("Old monkey %s" % self.name)

def patch(self):
    print("New monkey %s" % self.name)

monkey = Monkey("Baloo")
monkey.describe()

Monkey.describe = patch
monkey.describe()

In [None]:
monkeys = [Monkey("Baloo"), Monkey("Super singe")]


monkey_name = monkey.name

for i in range(1000):    
    monkey_name

## Data classes

Since `Python 3.7` you can use a dataclass for this

Does a lot of work for you (produces the `__repr__` among many other
things for you)

In [None]:
from dataclasses import dataclass
from datetime import datetime 

@dataclass
class Student(object):
    name: str
    birthyear: int
    major: str = 'computer science'

    @property
    def age(self):
        return datetime.now().year - self.birthyear
        
anna = Student(name="anna", birthyear=1987)
anna

In [None]:
print(anna.age)

# Most common mistakes

-   Let us wrap this up with the most common mistakes with `Python`

First, best way to learn and practice:

-   Start with the official tutorial
    https://docs.python.org/fr/3/tutorial/index.html

-   Look at
    https://python-3-for-scientists.readthedocs.io/en/latest/index.html

-   Continue with the documentation at
    https://docs.python.org/fr/3/index.html and work!

## Using a mutable value as a default value

In [None]:
def foo(bar=[]):
    bar.append('oops')
    return bar

print(foo())
print(foo())
print(foo())

print('-' * 8)
print(foo(['Ah ah']))
print(foo([]))

In [None]:
print(foo.__defaults__)
foo()
print(foo.__defaults__)

-   The default value for a function argument is evaluated once, when
    the function is defined
-   `the` bar argument is initialized to its default (i.e., an empty
    list) only when foo() is first defined
-   successive calls to `foo()` (with no a `bar` argument specified) use
    the same list!

One should use instead

In [None]:
def foo(bar=None):
    if bar is None:
        bar = []
    bar.append('oops')
    return bar

print(foo())
print(foo())
print(foo())
print(foo(['OK']))

In [None]:
print(foo.__defaults__)
foo()
print(foo.__defaults__)

No problem with immutable types

In [None]:
def foo(bar=()):
    bar += ('oops',)
    return bar

print(foo())
print(foo())
print(foo())

In [None]:
print(foo.__defaults__)

## Class attributes VS object attributes

In [None]:
class A(object):
    x = 1

    def __init__(self):
        self.y = 2

class B(A):
    def __init__(self):
        super().__init__()

class C(A):
    def __init__(self):
        super().__init__()

a, b, c = A(), B(), C()

In [None]:
print(a.x, b.x, c.x)
print(a.y, b.y, c.y)

In [None]:
a.y = 3
print(a.y, b.y, c.y)

In [None]:
a.x = 3  # Adds a new attribute named x in object a
print(a.x, b.x, c.x)

In [None]:
A.x = 4 # Changes the class attribute x of class A
print(a.x, b.x, c.x)

-   Attribute `x` is not an **attribute** of `b` nor `c`
-   It is also not a **class attribute** of classes `B` and `C`
-   So, it is is looked up in the base class `A`, which contains a
    **class attribute** `x`

Classes and objects contain a hidden `dict` to store their attributes,
and are accessed following a method resolution order (MRO)

In [None]:
a.__dict__, b.__dict__, c.__dict__

In [None]:
A.__dict__, B.__dict__, C.__dict__

This can lead to **nasty** errors when using class attributes: learn
more about this

## Python scope rules

In [None]:
try:
    ints += [4]
except NameError:
    print("NameError: name 'ints' is not defined")

In [None]:
ints = [1]

def foo1():
    ints.append(2)
    return ints

def foo2():
    ints += [2]
    return ints

In [None]:
foo1()

In [None]:
try:    
    foo2()
except UnboundLocalError as inst:
    print(inst)

### What the hell ?

-   An assignment to a variable in a scope assumes that the variable is
    local to that scope
-   and shadows any similarly named variable in any outer scope

``` python
ints += [2]
```

means

``` python
ints = ints + [2]
```

which is an *assigment*: `ints` must be defined in the local scope, but
it is not, while

``` python
ints.append(2)
```

is not an *assignemnt*

## Modify a `list` while iterating over it

In [None]:
odd = lambda x: bool(x % 2)
numbers = list(range(10))

try:
  for i in range(len(numbers)):
      if odd(numbers[i]):
          del numbers[i]
except IndexError as inst:
    print(inst)

Typically an example where one should use a list comprehension

In [None]:
[number for number in numbers if not odd(number)]

## No docstrings

Accept to spend time to write clean docstrings (look at `numpydoc`
style)

In [None]:
def create_student(name, age, address, major='computer science'):
    """Add a student in the database
    
    Parameters
    ----------
    name: `str`
        Name of the student
    
    age: `int`
        Age of the student
    
    address: `str`
        Address of the student
    
    major: `str`, default='computer science'
        The major chosen by the student
    
    Returns
    -------
    output: `Student`
        A fresh student
    """
    pass

In [None]:
create_student('Duduche', 28, 'Chalons')

## Not using available methods and/or the simplest solution

In [None]:
dd = {'stephane': 1234, 'gael': 4567, 'gontran': 891011}

# Bad
for key in dd.keys():
    print(key, dd[key])

print('-' * 8)

# Good
for key, value in dd.items():
    print(key, value)

In [None]:
colors = ['black', 'yellow', 'brown', 'red', 'pink']

# Bad
for i in range(len(colors)):
    print(i, colors[i])

print('-' * 8)

# Good
for i, color in enumerate(colors):
    print(i, color)

## Not using the standard library

While it’s **always** better than a hand-made solution

In [None]:
list1 = [1, 2]
list2 = [3, 4]
list3 = [5, 6, 7]

for a in list1:
    for b in list2:
        for c in list3:
            print(a, b, c)

In [None]:
from itertools import product

for a, b, c in product(list1, list2, list3):
    print(a, b, c)

# That’s it for now !