We believe open source software is a corner-stone of modern IT, and we’re proud that this month, we were able to contribute. Here, we describe how much work is behind such a simple bugfix.
We use many libraries every day, one of them being Pandas, the popular data analysis / manipulation library for Python. With over 26K stars on GitHub, you can tell that we’re not the only ones. This library is being used by almost every Python programmer who is doing data–science related work. It’s a huge deal.
This is why it makes us quite happy to contribute to Pandas’ development. One of our founders, Felix, discovered a bug in Pandas, which had to with the old programmer’s nemesis ― daylight saving time. Resampling a time series (that’s a series of numbers, each with a date&time attached, e.g. sensor readings for a week) could crash Pandas if one daylight saving time change occurred within that time. Felix found the problem and submitted the solution ― it took only 5 lines of code.
However, what makes an open source contribution such a big deal is not reflected in these five lines. As a contributor, these are the steps you need to accomplish:
- Figure out if this behaviour is really a bug. You should be quite certain before you bother the project maintainers. So Felix needed to understand the daylight savings time problem very well.
- Figure out where the problem occurs in the code base. In a foreign code base, mind you, that you don’t know by heart. And Pandas is not small. Only the module in which the problem occurred has ~2K lines of code.
- Come up with a solution (these are the five lines). In this case Felix needed to understand that an extra equality check could get the job done. Oh, and the Pandas project has certain standards to how their code needs to be formatted (for readability reasons), so Felix needed to educate himself about that and write accordingly.
- Test the new code. If it’s not tested, it might break under other circumstances with no easy way to tell. Felix wrote six test cases, which the Pandas project will run automatically with every build from now on. For this, he had to learn how Pandas engineers have decided to test their code.
- Document what you did. Felix created a GitHub issue, so he could discuss this with the project maintainers, and on his PR he documented the steps he took. Finally, he added his change to the release notes for the next Pandas release.
Open source software is serious business, both in terms of how many companies and organizations rely on it every day, and how much work is put in maintaining high quality.
We’re proud to be part of Pandas history now!