Course IFEBY310: Big Data Technologies
for Master MIDS (M1) at Université Paris Cité introduces a collection of software technologies dedicated to Big Data management. This course is designed for students with dual background in Mathematics and Computer Science.
During this course, you shall learn to:
- Handle middlesize data using Python Data Stack: Numpy/Scipy/Pandas
- Scale up and down with Dask
- Handle Big Data with Spark (PySpark)
- Manage and store data using dedicated columnar formats (Parquet, ORC, Avro, Arrow)
We will work on Spark using Docker images. It is a good idea to install Python and the Python data stack on your laptop.