Experiences from the Data Umbrella Sprint

código

reflexión

Lo que aprendí en mi primer sprint contribuyendo a código abierto.

Author

sebastiandres

Published

July 25, 2021

For a long time I had the desire to contribute “at some point” to an open source project. However, that ideal moment was always postponed for later, when I had more experience and more time. That is why, when I read about the Data Umbrella Sprint in several python and programming Telegram channels, I did not hesitate and signed up. It was the perfect opportunity to force myself to learn. There is nothing like a bit of social pressure to get a procrastinator’s gears moving.

To be honest, I was not familiar with Data Umbrella. It is an organization that cares about providing support to underrepresented groups, whether by gender, race, age, sexual orientation or others, in the fields of Machine Learning, Data Science and Artificial Intelligence. The sprint they held on June 26 had Latin America as its focus, which has low participation in these topics. Data Umbrella’s work with underrepresented groups is very valuable for tearing down all the myths and entry barriers that may be holding back the arrival of new talent.

What I liked most about the Data Umbrella Sprint was the organization: they had a very precise checklist of the topics to review, with videos explaining each step. That is why it was easy to estimate how much time you needed preparing or learning before the sprint. The use of discord also helped a lot to give it an informal and community feel, and served to answer questions and get to know each other. Joining a new group is always hard, and for newcomers the challenge is even greater. Having a pre-sprint and post-sprint helps consolidate the human and community aspect, solve the technical problems that always appear, and gain confidence.

During the sprint, organizing ourselves to do pair programming was also a great help. With Leonardo Rocco we worked on 2 issues: * DOC Ensures that ARDRegression passes numpydoc validation #20381 * DOC ensures FastICA estimator pass the numpydoc validation #20405

I can proudly say that both pull requests have already been accepted!

Reflecting on my experience at the Sprint, I realize that I had the expectation that I was missing many things to learn.

At the sprint I learned that you do not need to be a super-programmer to contribute to open source. The reality is that, to begin with, there is no single way to contribute. There is an endless range of possible tasks, from the simplest to the most advanced, and a long learning path. That is why it is important to realize that it is not that you “don’t know” but that you “don’t know yet”, and that there is a community willing to support you in that learning process. We are all in a learning process. Getting involved in collaborative projects is precisely a way to accelerate learning, and, along the way, contribute to the libraries you use the most.

There is an excellent report of the sprint on Reshama’s blog. The distribution of participants by country is quite surprising. I imagined a more uniform distribution, but most of the participants are from Argentina and Brazil. Something has to be done about it!

In addition to community and individual contributions, another important element of open source is grants and funding from companies. Ask your boss for budget to fund the tools you use daily! In particular, this sprint was funded in part by a grant from Code for Science & Society. This is community and transparency in their purest form: you can get all the details of the grant online: Grant number GBMF8449 at the Gordon and Betty Moore Foundation.