Weeks 4
Saprk SQL
Important
- Monday 02 February 2026 Halle 166E 10h45-12h45
- Calendar
Lecture : slides
We went through the first part (up to Aggregation) of - Spark high level APIs: SQL
We came back to section Pair RDD
Notebooks
We shall spend most of the lectures on
We did not touch Aggregation
You shall have gone through (on your own)
References
Logistics
pyspark
To work the jupyter notebooks, install python 3, and modules related to jupyter: jupyter-cache, jupyter_client, jupyter_core, jupyterlab_widgets (this induces the installation of dependencies).
Download the jupyter notebooks from notebooks listings.
If you do not already have an ENT account, follow instructions on Moodle to get one. You shall need this account to connect to PostGres cluster.
Back to Agenda ⏎