Technologies Big Data Master MIDS/MFA/LOGOIS
Invalid Date
Python ?PythonPythontuples, dict, list, set, etc.)C/C++ and easily accelerated (cython, numba, pypy)Pythonstackoverflow 2025 surveyPython popularity growthPython for data science ?Besides these features, Python has:
Python Data Science Stack: Maths / Science
numpy is all about multi-dimensional arrays and matricesnumpy.linalgnumpy.random
scipy extends numpy with extra modules:
scipy.sparsePython Data Science Stack: Data processing
pandas builds upon numpy to provide a high-performance, easy-to-use DataFrame object, with high-level data processingcsv, json, hdf5, feather, parquet, etc.SQL semantics: select, filter, join, groupby, agg, , where, etc.
dask is roughly a distributed and parallel pandaspandas !spark, but can be usefulspark, full Python (no JVM)Links
pyspark is the python API to spark, a big data processing frameworkspark is scala: pyspark can be slower (much slower if you are not careful)SQLAlchemy
The universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.
Python Data Science Stack: Data Visualization![]() |
![]() |
![]() |
matplotlib provides versatile 2D plotting capabilities
Links
![]() |
![]() |
![]() |
javascript graphic library d3.jspython interface, can be used in a jupyter notebookLinks
![]() |
![]() |
![]() |
Vega-Altair: Declarative Visualization in Python
Vega-Altair is a declarative visualization library for Python. Its simple, friendly and consistent API, built on top of the powerful Vega-Lite grammar, empowers you to spend less time writing code and more time exploring your data.
Python Data Science Stack: DashboardsLinks
Links
Python Data Science Stack: environments

Ways to use all these tools
Write a script script.py and use python directly in a CLI : python script.py
Use the ipython interactive shell

jupyter: a web application that allows to create and run documents, called notebooks (with .ipynb extension)notebook has a kernel running a python/R,Julia, … threadipynb file is a json document. Leads to bad code diff, a problem with git versioningLinks
QuartoReticulateReticulate embeds a Python session within your R session, enabling seamless, high-performance interoperability. If you are an R developer that uses Python for some of your work or a member of data science team that uses both languages, reticulate can dramatically streamline your workflow!
Links
Py2RPython has several well-written packages for statistics and data science, but CRAN, R’s central repository, contains thousands of packages implementing sophisticated statistical algorithms that have been field-tested over many years. Thanks to the
rpy2package, Pythonistas can take advantage of the great work already done by the R community.rpy2provides an interface that allows you to run R in Python processes. Users can move between languages and use the best of both programming languages.
Many libraries for statistics, machine learning and deep learning
numba, cython, cupyPython APIs for most databases and clouds
Processing and plotting tools for Geospatial data
Image processing
Web development, web scrapping
among many many many other things…
IFEBY030 – Technos Big Data – M1 MIDS/MFA/LOGOS – UParis Cité