Data Science Work Productivity Tips & Tricks

How to increase your productivity when working on Machine Learning and Data Science projects

Posted by Tobias on 28-08-2018

When working on data science projects, we are often limited in time and don't have an infinite amount of computing resources. How to organize your work so that you maintain maximum productivity?

Here are several recommendations.

Train on a subset much smaller than the whole data set you have first. Once you see exactly where you should go (in terms of features, loss function, metrics, and values of hyperparameters), then scale things up.

Reuse knowledge gained from previous projects. The problems you work on will probably be similar to one another. Reusing the best values of hyperparameters or feature extractors from similar problems you (or somebody else) solved in the past will save you a lot of time.

Queue your experiments in such a way that they automatically keep your available GPUs running 24/7.

Setup automated alerts that will inform you that a specific experiment is over. It will save you time in case something went wrong with the experiment.

Use Jupyter notebooks for quick prototyping. Rewrite your code into Python packages/classes once you are satisfied with the result. (Avoid writing and keeping all your code in Jupyter!)

Use a powerful text editor such as Sublime Text or Atom, that have autocomplete and powerful regex-based search/replace capability.

Use a well-designed IDE for coding. For example, for Python projects, we prefer using PyCharm rather than, for example, Eclipse+PyDev or a usual text editor.

Speedup iterations in your data science process. One way to do this is to use tools data version control or DVC.

Keep your experiment code in a version control system, such as git.

Run multiple experiments at once using tools like Tmux.

Optimize your code early to save time on waiting for your experiment to finish. In Python, the best way to find inefficiencies in your code is by using cProfile.

Have a list of things to do while your experiments are running: data collection, cleaning, annotation; reading on new data science topics, experimenting with a new algorithm or a framework. All those activities will contribute to the success of your future projects.

Read our previous post "Big O Notation, How it Works and What It's Used For" or subscribe to our RSS feed.

Found a mistyping or an inconsistency in the text? Let us know and we will improve it.

Like it? Share it!