IFEBY310: Big Data Technologies

Course IFEBY310: Big Data Technologies
for Master MIDS (M1) at Université Paris Cité introduces a collection of software technologies dedicated to Big Data management. This course is designed for students with dual background in Mathematics and Computer Science.

During this course, you shall learn to:

  • Handle middlesize data using Python Data Stack: Numpy/Scipy/Pandas
  • Scale up and down with Dask
  • Handle Big Data with Spark (PySpark)
  • Manage and store data using dedicated columnar formats (Parquet, ORC, Avro, Arrow)

We will work on Spark using Docker images. It is a good idea to install Python and the Python data stack on your laptop.