How to Start in Machine Learning and Data Science

A comprehensive guide to starting off in Machine Learning and Data Science on the right foot

Posted by Stacy on 04-08-2018

Today, when you are a beginner in Machine Learning and Data Science, it's easy to get lost. There's so much to know, so many articles, tutorials, how-tos, algorithms, problems, libraries, frameworks, programming languages.

Where should you start from?

In this post, we share the best advice people of semanti.ca could give to a beginner in Machine Learning and Data Science.

Online Courses

In 2010s, the best way to learn about anything in computer science is through online courses. Most online courses are free to watch. However, you usually have to pay if you want to work on practical assignments and get your certificate.

There are several online learning platforms. Here are the most popular ones:

  • Udemy has the largest selection of courses. Courses can be taught by anyone (not necessarily a university professor, so quality varies), which can make it difficult for a potential student to determine which might be the best course for them to take.

  • Udacity offers corporate sponsored courses taught by industry experts. Most of the courses are focused on the tech industry.

  • Coursera is one of the oldest platforms with courses taught by university professors. The variety and quality of courses are high. Coursera is led by Stanford University, but many other top universities participate.

  • EdX is similar to Coursera but is led by Harward and MIT.

As of 2018, the best courses in Machine Learning are as follows:

Data Science includes Machine Learning as a tool, so if you want to be a Data Scientist rather than a Machine Learning Engineer, here are your alternatives:

Deep Learning (aka Neural Networks) is a subfield of Machine Learning. The best online courses belong to the Deep Learning Specialization by Andrew Ng on Coursera.

If you find such a choice overwhelming, we recommend you to start from Machine Learning course by Andrey Ng then either go deeper with the Deep Learning Specialization or wider with the Machine Learning Specialization.

After that, take any data science course from the above list to see what you missed.

Books

The classics for beginners in Machine Learning is Data Mining: Practical Machine Learning Tools and Techniques. You can skip the second part of the book about Weka. Nobody really uses Weka in 2018.

Once you are ready to dive deeper, you can read The Elements of Statistical Learning: Data Mining, Inference, and Prediction to increase your understanding of Machine Learning theory.

After that, you can learn the neural networks theory with the Deep Learning book (can be legally downloaded online) and Deep Learning with Python which teaches the practice (with Python and Keras).

If you are interested in Natural Language Processing, then these two books are the must: Foundations Statistical Natural Language-Processing for the theory and Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit for the practice (with Python and NLTK).

Programming Libraries

When you are ready to start programming your first data analysis solution, consider the following Python packages:

  • Pandas, for data pre-processing, cleaning, transformation;
  • scikit-learn for a comprehensive collection of Machine Learning algorithms;
  • Keras, an easy to use and versatile Deep Learning library;
  • TensorFlow, a more low-level Deep Learning library that gives you more control over your neural network architectures;
  • Matplotlib for your data visualizations;
  • Jupyter Notebook to create and share documents that contain code, equations, visualizations and descriptive text.

Stay Up to Date

Here's the list of news sources and blogs to stay updated on the state of Data Science and Machine Learning:

Now, you are good to start your learning!

Also, read our tutorial on how to prepare for a machine learning interview.


Read our previous post "How to Extract Data from Web Pages" or subscribe to our RSS feed.

Found a mistyping or an inconsistency in the text? Let us know and we will improve it.


Like it? Share it!