IFEBY310 Syllabus
Organization
We will have one weeky lecture. Each lecture is organized around Slides and Notebooks. We will switch from blackboard to laptop and back. You are invited to bring your laptop to the lectures.
Day | Hour | Room | Start | |
---|---|---|---|---|
Lecture | Friday | 15:45 - 17:46 | Sophie Germain 014 | 2025-01-17 |
We will not attempt to complete the notebooks during the sessions. You are expected to complete the noteboks on your own time. Solutions (at least partial solutions) are available on the course website.
You can fork the course repository and post issues, comments, and corrections.
Objectives
During this course, you shall learn to:
- Handle middlesize data using Python Data Stack: Numpy/Scipy/Pandas
- Scale up and down with Dask
- Handle Big Data with Spark (PySpark)
- Manage and store data using dedicated columnar formats (Parquet, ORC, Avro, Arrow)
Communication
Course material: s-v-b.github.io/IFEBY310 Fork the repo, use github issues to send feedback (no email please)
Alerts are spread through Moodle
Register at Moodle portal to be updated
Références
- Pandas Book
- Python Data Science Handbook
- Dask
- Spark
- Data pipelines
- Data pipelines
- Alice
- Documentation PostGres
- Next Generation Databases NoSQLand Big Data, Guy Harrison
- Guy Harrison Blog
- Databases trends and applications
- Upcoming book “Principles of Databases”, by Marcelo Arenas, Pablo Barcelo, Leonid Libkin, Wim Martens, and Andreas Pieris.
Evaluation
Two homeworks/projects
Grading
Trucs
- Have a look at slides before the course
- Don’t jump to corrections
- Use online help (StackOverflow, ChatGPT, copilot, …)
- Read error messages
Code of conduct
TL;DR: No cheating!
Save the dates !
- January 17: Course kick-off
- February 14: No session
- February 28: Winter Holidays
- March 7: Session 6
- April 4: Room 2017
- April 18: Eastern Holidays
- April 25: Eastern Holidays
- May 2: Session 12