Keys to Successful Data Science Collaboration and Communication

Crissy Bruce
3 min readJan 19, 2021

You’re working on your first big data science project, and you have been assigned a partner to divide up the large load at hand. How exciting!

So you talk with your partner about tackling the work. You discuss using Github to house the repository of all files for the project. Then you discuss using branches to do the work.

Each of you create a branch of the main to use to do your part of the work. When you review the main branch after the first round of project work, you see that all of the code in your branch was not merged into the main branch. And now, you’re just frustrated that the work you spent so much time on was not implemented. You thought your partner would know to pull your work when updating the main branch, but making assumptions usually does not lead to the desired result.

This is one of many reasons why you must communicate and collaborate openly when working on data science projects. Creating a strong plan for collaboration and communication when you start a project is key to a successful project from start to finish.

Here are the steps to put your best foot forward on multiperson data science project work:

1.Enforce common data science infrastructure. Data Science infrastructures can vary widely from one person or organization to another, and that is why it is important to decide on common components in order to make sharing and working together as easy and fast as possible. Here are some of the components to keep consistent:

  • Same Hardware
  • Common Tooling. Find out what tools are used by all members of the group and decide on what will be selected to use by the team as a whole.
  • Common Environment Management (e.g. Conda environments or Docker containers)
  • Online Data Storage to house all documents related to the project so the team or other members of the company may access files when needed.

2. Identify how updates to the main document will be completed. Assuming you know this information will likely result in frustration and resentment if not confirmed in advance. This will be similar to a project manager type role in case you have a volunteer for such a role on your team.

3. Set up time to work together in advance so you can prepare for the items you’d like to discuss, like ideas for improving the project and sharing new code you discovered while working on your own. Create a shared calendar and add project meeting times so that all team members can access the information.

4. Create a working document to record all things about the project, including the previous steps here. This should also include the project details including the objective of the project, stakeholders, etc.

5. Document reproducible workflows so that other team members can run the workflow without knowing the same level of detail of the originator. This again is similar to a project manager type role.

6. Make work visible. This will reduce the feeling of isolation. A good example is GitHub as you can see the steady flow of activity on the project(s).

7. Create an online knowledge center that can be used for sharing best practices. This will also help you avoid feeling like you’re working in a silo. And, it could give a sense of contributing to the betterment of the team.

8. And lastly, be sure there is a final signoff on the work by the data science team along with stakeholders that are involved in the project.

Data Science work is very time consuming, so the last thing you need is ongoing nuances and frustration created by lack of communication and/or not being able to reproduce work that would save hundreds of hours of time. Using the steps above will help direct your team into a healthy work environment so more time can be spent on actual data science work and less time spent on figuring out the logistics of components of the project.

https://towardsdatascience.com/best-practices-for-collaborative-data-science-adbff75c7d97

https://channel-tools.co.uk/blog/collaboration-tools-for-remote-data-science-teams/

--

--