Today, I read this super interesting blog post by Dan Saber. It does a really good job at providing an overview of the data visualization libraries that are commonly used in Python, and how they compare. Exactly the kind of thing I’m looking for. Sweet!
I’ll give a short-and-sweet overview of the libraries that Dan talks about. I’ll talk about what the libraries are intended for, what they’re good at and where they should be avoided. All credits naturally go to Dan.
matplotlib: first on the list, and Python’s most well-known, widely-used data visualization library. Produces very ‘scientific’ plots, good for in publications but not very pretty per se. Also, the code is somewhat horrible sometimes (complex plots tends to mean complex code).
pandas: very nice data manipulation library. Has a lot of great features for data manipulation and analysis, but not thatttt amazing for visualizations. Sometimes similarly complex as matplotlib. On the upside, this complexity also means that a lot can be manipulated.
yhat’s ggplot: This package is based on ggplot2, which is a commonly used visualization package in R. Its philosophy is quite intruiging: the different variables that you want to include on the plot are linked to aesthetic mappings, such as x, y and color, after which you call geoms (e.g. geom_line) to actually display the data.
Altair: new kid on the block. Similar philosophy to ggplot regarding aesthetic mappings and geoms. What is also very nice in this library (and to a lesser extent also for ggplot) is that many different plots can be created using only slightly adapted code. For example, in matplotlib a line chart and a histogram are very different things, whereas code is more stable between them in Altair. Somewhat limited in scope though (box plots are not possible).
In addition, in the comments section, the library bokeh seems to pop up frequently. I should check it out 🙂
Coding examples can be found in Dan’s blog. Thanks Dan!