Recommended IDE for Data Scientists and Machine Learning Engineers

An overview of the development environments for data scientists and machine learning engineers programming in R, Python, Scala, and Julia

Posted by Tobias on 02-10-2018

Integrated Development Environment, or IDE, is a tool that allows software developers to write, test and debug their programming code easier than in general-purpose text editors. An IDE typically offers a text editor, automated code validation, syntax highlighting, completion, contextual suggestions, easy access to help, method and class specification, resource management, and debugging tools.

Because of a rich collection of features, IDEs are extremely useful for software development: they make the life of a programmer more comfortable. This is no different for a data scientist. However, given the fact that there aren't only the traditional IDEs (the ones you install on your computer) to consider, but also new tools, such as Jupyter notebooks running in the browser, the reader might be wondering which development environment to use, especially when you are just starting out with data science.

In this post, we give several IDE suggestions for four programming languages most frequently used by data scientists: R, Python, Scala, and Julia.

For every language, we start the list with the preferred choice of data scientists and machine learning engineers at semanti.ca.

Python

PyCharm

JetBrains is the company that has developed IDEs for multiple programming languages and PyCharm is one of them. PyCharm is a cross-platform IDE for Python.

PyCharm
PyCharm

PyCharm's smart code editor provides first-class support for Python. It features code completion, error detection, and on-the-fly code fixes. Smart search can be used to jump to any class, file or symbol, or even any IDE action or tool window. One click is sufficient to switch to the declaration, super method, test, usages, implementation, and more.

PyCharm has a huge collection of tools, including an integrated debugger and test runner, Python profiler, a built-in terminal, integration with major version control systems (including Git, SVN, and Mercurial), remote development capabilities with remote interpreters, an integrated ssh terminal, integration with Docker and Vagrant.

PyCharm integrates with Jupyter Notebook, has an interactive Python console, and supports Anaconda as well as multiple scientific packages including Matplotlib and NumPy.

As a nice touch, PyCharm has a dark theme, which for many semanti.ca's data scientists and developers is a huge plus.

Spyder

Spyder is a powerful scientific environment designed for being primarily used by scientists, engineers, and data analysts. It offers a combination of the advanced editing, analysis, debugging, and profiling functionality of a comprehensive development tool with the data exploration, interactive execution, deep inspection, and beautiful visualization capabilities of a scientific package. Spyder's abilities can be extended further via plugins and API.

Spyder
Spyder

Spyder has a multi-language editor with a function/class browser, code analysis tools, automatic code completion, horizontal/vertical splitting, and go-to-definition.

Profiler allows finding and eliminating inefficiencies in the code, debugger allows tracing each step of the code's execution interactively.

A nice touch is that Spyder itself is written in Python.

R

RStudio

RStudio is the most feature-rich IDE for R. It is available in open source and commercial editions on the desktop (Windows, Mac, and Linux) and from a web browser to a Linux server running RStudio Server or RStudio Server Pro.

RStudio
RStudio

RStudio features syntax highlighting, code completion, and smart indentation. R code can be executed directly from the source editor. The developer can quickly jump to function definition, read help and documentation, easily manage multiple working directories using projects. The integrated data viewer allows viewing the tabular data, which combined with step-by-step execution in debug mode allows examining how the data is being updated in real time.

RStudio has integrated support for Git and Subversion and supports authoring HTML, PDF, Word Documents, and slideshows, as well as interactive graphics (with Shiny and ggvis).

StatET Plugin for Eclipse

Eclipse is one of the most popular IDE for Java. It also allows installing plugins to support other programming languages. StatET is an Eclipse-based IDE for R. It offers a set of tools for R coding and package building. This includes a fully integrated R Console, Object Browser, Package Manager, Debugger, Data Viewer and R Help System, whereas multiple local and remote installations of R are supported. The optional add-ons for Sweave and Wikitext (Markdown, Textile) provide source editors and build tools for LaTeX/Wikitext documents with R chunks.

StatET Plugin for Eclipse
StatET Plugin for Eclipse

The code editor provides syntax highlighting, text folding of Roxygen comments, function definitions and other blocks, auto-correction of line indentation, auto indent on typing and pasting.

A visual debugger allows simple management of breakpoints and conditional breakpoints. The debugger features a clearly presented call stack/traceback with direct access to variables of the selected frame, source code and instruction pointer (R Editor), as well as stepping through the source code.

StatET includes a data viewer (spreadsheet) allowing to inspect vectors, matrices, and dataframes, featuring fast display for very big tables.

R Tools for Visual Studio

Visual Studio is the most widely used IDE for .NET languages and C++. R Tools for Visual Studio (RTVS) is a free, open-source extension for Visual Studio released under the MIT license.

R Tools for Visual Studio
R Tools for Visual Studio

With Visual Studio data scientists can organize and manage related files in a convenient structure, and take advantage of useful templates for items such as R code, R documentation, R Markdown, SQL queries, and stored procedures. Also available a package manager and SQL Server integration.

RTVS can bind to local and remote workspaces, allowing developers to develop R code locally with smaller data sets, then easily run the code on more powerful cloud-based computers with much larger data sets.

As any modern IDE, RTVS includes syntax coloring, code formatting, signature help, go-to-definition, find-all-references, and code snippets.

Developers can share data results via R Markdown documents, with integrated R code inside markdown code blocks.

RTVS provides a full REPL experience for R with the ability to easily run code in a source file in the interactive window.

Plotting is an important part in R. To make plotting in R easy RTVS supports multiple, independent plot windows, each with their own history and the ability to move plots between windows. Plots can be saved to graphic or PDF files, or copied to the clipboard.

Variable explorer allows examining variables in the global or package-specific scopes, with the ability to view sortable tables and export to CSV.

R Kernel for Jupyter Notebook

Contrary to what many data scientists think, Jupyter doesn't limit you to working solely with Python: the notebook application is language agnostic, which means that it is also possible to work with other programming languages.

R Kernel for Jupyter Notebook
R Kernel for Jupyter Notebook

To get started on working with R in the notebook environment, one has to load the IRKernel and activate it.

R-Brain

R-Brain provides an integrated cloud/on-premises data science platform for developing models with popular open source languages. R-Brain is powered by Jupyter and offers an IDE, a console, a notebook and markdown that are all integrated into one environment with full language support for R (as well as Python). It has integrated code-completion, debugging, packaging, and publishing capabilities.

R-Brain
R-Brain

R-Brain offers standard features of the classic Jupyter Notebook (interactive notebook, terminal, text editor, file browser, rich outputs, etc.) in a flexible user interface. It uses Docker container technology, so this solution can be deployed on-premises or in the cloud.

Data scientists can develop, package, share and publish analytics workspaces, data sets, and applications that use R, Python and SQL scripts. R-Brain also makes it easy to interactively navigate database schemas, view table content and export data.

Scala

Scala IDE for Eclipse

Scala IDE for Eclipse provides advanced editing and debugging support for the development of pure Scala and mixed Scala-Java applications, allowing references from Scala to Java and vice versa.

Scala IDE
Scala IDE

As with any modern IDE it has code completion, semantic code highlight, and go-to-definition. It catches compilation errors as you type.

Scala Debugger allows stepping through closures and provides a Scala-aware display of debugging information.

Scala Wizards simplify the creation of a class, an object, a trait or a package object. Refactoring allows renaming identifiers, organizing imports, extracting some code as a new method, and more.

Additional features include code formatting, Smart Indenter that makes your code indented as you type, marking occurrences of any identifier in a file, full syntax highlighting support, including comments, control structures, and embedded XML, as well as code folding.

Scala Plugin for IntelliJ IDEA

IntelliJ IDEA is another IDE from JetBrains famous for its ergonomics and intelligent coding assistance it provides for developers coding in Java, JavaScript, and other languages. The Scala plugin extends IntelliJ IDEA's toolset with support for Scala, SBT, Scala.js, Hocon, and Play Framework.

Scala Plugin for IntelliJ IDEA
Scala Plugin for IntelliJ IDEA

The following features are available: coding assistance (highlighting, completion, formatting, and refactorings), navigation, search, information about types and implicits, integration with SBT and other build tools. The plugin also supports testing frameworks such as ScalaTest, Specs2, and uTest. Also featured a Scala debugger, worksheets, and Ammonite scripts.

Jupyter Notebooks

Scala and Spark Scala Kernels are fairly easy to install, both have the ability to add Maven/SBT dependencies and JARs. Just as with Python and R, the cells in the notebook can be run individually, allowing data scientist to train a model once and use it many times.

The cells support markdown (with LaTeX support) which can be rendered on its own, allowing data scientist to use their notebooks as a report that can be shared with clients or colleagues.

As with other languages, the downside of using Jupyter Notebook is buggy or the limited functionality of kernels, very limited or absent debugging. Data scientists need to be careful about the ordering of their cells: failure to do so can cause a lot of confusion.

Julia

Juno

Juno builds on Julia's unique combination of ease-of-use and performance. With a completely live environment, Juno aims to take the frustration and guesswork out of programming and put the fun back in. A hybrid "canvas programming" style combines the exploratory power of a notebook with the productivity of an IDE.

Juno
Juno

Juno is built on Atom, a text editor provided by Github, which means it's a powerful editor as well as a pretty UI.

Juno consists of both Julia and Atom packages in order to add Julia-specific enhancements, such as syntax highlighting, a plot pane, integration with Julia's debugger Gallium, a console for running code, and more.

It's very customizable and has features for power users like multiple cursors, fuzzy file finding, and Vim keybindings.

Jupyter Notebooks

IJulia is a Julia-language backend combined with the Jupyter interactive environment (also used by IPython). This combination allows you to interact with the Julia language using Jupyter powerful graphical notebook, which combines code, formatted text, math, and multimedia in a single document.

Julia extension for Visual Studio Code

The Julia extension for Visual Studio Code provides syntax highlighting, snippets, LaTeX snippets, Julia specific commands, integrated Julia REPL, code completion, hover help, a linter, code navigation, tasks for running tests, builds, benchmarks and build documentation.

Julia extension for Visual Studio Code
Julia extension for Visual Studio Code


Read our previous post "The Most Important Skills for a Data Scientist" or subscribe to our RSS feed.

Found a mistyping or an inconsistency in the text? Let us know and we will improve it.


Like it? Share it!