- Soledad Galli

# Best Resources to Learn Data Science

You must have heard about data science a lot nowadays. And why not? It is one of the hottest jobs of the 21st century.

Do you want to be a data scientist? There are tons of courses available online that will help you to get started. But which one to choose?

In this blog we recommend some great courses and books that will help you become a data scientist. For our recommendation, we considered the cost of the course and the knowledge you get from the course, prioritising those that can be taken for free.

**What is Data Science?**

Data Science is a multidisciplinary field that uses algorithms, statistics and scientific methods to extract knowledge and insights from structured and unstructured data. It takes concepts and theories from multiple fields such as computer science, mathematics, machine learning, statistics and information science to solve complex problems.

*Data Scientist* is a term used for data professionals who are skilled in data science, i.e., in organising, analysing and drawing inferences from large amounts of data. These professionals can identify problems and relevant questions, collect and organise data from various sources to create solutions and then communicate them effectively to organisations.

**What are the skills required to become a data scientist?**

**Programming**- Languages like R and Python**Statistics**–Forecasting, Descriptive and Inferential Statistics**Machine Learning algorithms -**E.g., Regression, Decision Trees, clustering**Data visualisation -**tools like Tableau, Excel and also Python and R**Database Handling-**SQL, NoSQL, among others**Software Engineering****Data Wrangling**(Process of cleaning, restructuring and enriching the raw data into a usable format)

Acquiring these skills will certainly take you some time, but fortunately, there are many resources available online to get you started. How to select the right course? How and where to start? This is exactly what we will discuss in this blog.

We will cover the following courses and books in this blog, along with some additional resources.

Disclaimer: Opinions stated in the article are my own and I do not become financially compensated by any of the links in this blog. This blog does not contain affiliate links.

Prices are indicative and they may vary based on location and their platform terms and conditions

**Contents**

**1. Courses**

Coursera -

**Data Science Specialization**by Johns Hopkins UniversityCoursera -

**Data Science Professional Certificate**- IBM**Path Data Scientist**- DataQuestedX -

**Professional Certificate in Data Science**- Harvard UniversityCode Academy -

**Become a Data Scientist**Udemy -

**Complete Data Science Bootcamp**Udemy -

**Data Science A-Z**

**2. Books **

O’Reilly -

**Data Science from Scratch**O’Reilly -

**Doing Data Science**Roger D. Peng and Elizabeth Matsui -

**The art of Data Science**

**3. Additional Resources **

· Harvard University

**-****Online Data Science Courses**

## 1. **Courses**

There are many online courses to help you get the right skills for data science. You can find these courses on E-learning websites like Coursera, edX and Udemy. One good thing about online courses is they can be highly interactive which could help you understand and remember concepts easily.

To be a successful data scientist there are some skills and knowledge you need to acquire. These are:

Now let’s dive in and learn about some excellent online courses out there.

**1.1 **** Data Science Specialisation**** - Coursera (Johns Hopkins University)**

**Time Required:**Approx. 8 Months, but you can do it quicker or slower if you don’t want a certificate.**Price:**$49 USD per month, if you want to have a certificate else it’s free.

**Skills you will gain:**

R programming

Use of Github

Data handling and Exploratory Data Analysis

Machine Learning

How to make statistical inference

Regression Analysis

How to build data products

**Total Number of Courses in this Specialization:** 10

The Data Scientist ToolBox

R Programming

Getting and Cleaning Data

Exploratory Data Analysis

Reproducible Research

Statistical Inference

Regression Models

Practical Machine Learning

Developing Data Products

Data Science Capstone

**About:**

This Specialisation covers almost every aspect of the data science pipeline. This Specialisation provides both a conceptual and a practical introduction to data science, focusing on tools and concepts you require to build a successful data science project, like R programming, gathering data from various sources like the web and databases, and how to present data. It also teaches the use of tools like Github, which is a platform for discovering, sharing and building software. Data Scientists use Github for collaboration and to track changes in their work, they can even roll back changes if required. You also learn how to make inferences from the data, ask data-related questions and apply the skills you learn to solve a data-related problem.

To complete this Specialisation you need to finish all the courses and finally submit a capstone project where you will apply the skills that you gained to a real-world problem (note that capstone projects are only available if you pay the subscription).

The Specialisation uses R as the programming language. You will learn about R from scratch, both general programming principles and how to use R for data science and statistical computing. The courses will also teach you how to gather data from various sources and how to process data so that it can be used in machine learning projects.

Later courses in the Specialisation teach exploratory data analysis, helping you summarise your data with visualisation and exploratory techniques. They also cover the required tools and concepts required for making statistical inference from data, learning about regression models and using practical machine learning algorithms with R. Finally you will learn about building data products that will help you tell the story derived from the data. After you learn these skills, you can submit a capstone project. These projects are based on real-world problems and are conducted together with industry, government and academic partners.

**Pros:**

Good course for beginners.

No prior programming experience required.

Less shy with mathematics than other courses.

It covers most skills required to tackle a data science project, helping you get started in the field of data science.

Capstone Projects allow you to apply your skills to a real-world problem, giving you an overview of the entire data science pipeline.

**Cons:**

Basic understanding of statistics and probability required.

**1.2 ****Data Science Professional Certificate**** - Coursera (IBM) **

**Time Required:**Approx. 3 Months**Price:**$39 USD per month, if you want a certificate else it’s free.

**Skills you will gain:**

Python programming

Data visualisation

Machine Learning

Data analysis

Databases and SQL

**Total Number of Courses in this Certificate:** 9

What is Data Science

Data Science Open Source Tools

Data Science Methodology

Python for data science and AI

Database and SQL for Data Science

Data Analytics with Python

Data Visualization with Python

Machine Learning with Python

Data Science Capstone

**About:**

This is a beginner level course on Data Science. This set of courses will give the basic skills that will help you get started with data science. It describes the open-source tools available and commonly used in data science like R Studio, Apache Zeppelin and Jupyter Notebook. The courses also teach Python programming, use of databases, and concepts around data analysis, data visualisation and machine learning. You will dive into Machine Learning algorithms, like Linear and Logistic Regression, K-Nearest Neighbours,, Decision Trees, Support Vector Machines, Partitioned-based Clustering and Hierarchical Clustering. Finally, you will also get hands-on on the IBM cloud to build a Data Science capstone project.

**Pros:**

The content is well designed.

Good hands-on working on fundamental concepts.

No prior programming experience required.

**Cons:**

The course is just a beginner course, so a lot more depth is required in these topics.

**1.3 ****Path Data Scientist**** - DATAQUEST**

**Time Required:**Self-paced**Price:**$29 USD per month (basic) and $49 USD per month (premium)

**Skills you will gain:**

Python programming for Data Science

Data Visualisation

Data cleaning and data analysis

Databases and SQL

Web Scraping

Concepts on Statistics, Probability and Linear Algebra

Machine Learning

Data Structures and Algorithms

Git and Version Control

Spark and Map Reduce

**Total Number of Courses in this Track:** 35

Python for Data Science and Fundamentals

Python for Data Science Intermediate

Pandas and NumPy Fundamentals

Exploratory Data Visualisation

Storytelling through Data Visualisation

Data Cleaning and Analysis

Data Cleaning in Python - Advanced

Data Cleaning Project Walk Through

Elements of the command line

Text Processing in the command line

SQL fundamentals

SQL Intermediate

SQL Advanced

APIs and Web Scraping

Statistics and Fundamentals

Statistics Intermediate

Probability Fundamentals

Conditional Probability

Hypothesis testing and fundamentals

Machine Learning Fundamentals

Calculus for Machine Learning

Linear Algebra for Machine Learning

Machine Learning with Python - Intermediate

Decision Trees: Python

Deep Learning Fundamentals

Machine Learning Project

Kaggle Fundamentals

Exploring topics in Data Science

Natural Language Processing

Functions: Advance

Data Structures and Algorithms

Python programming advance

Command-line: Intermediate

Git and Version Control

Spark and Map Reduce

**About:**

This track is one of the best you can find online to learn about data science. The courses are incredibly detailed. These courses will teach you each and every aspect required to become a data scientist. The courses will teach you Python fundamentals and more advanced concepts, Data Analysis, Data Visualisation and how to query Databases with SQL. They also dive into mathematics a lot more than other courses. You will learn about statistics, linear algebra, calculus and probability. The later part of this track covers machine learning algorithms in detail, data structures and a bit on Natural Language Processing. These chapters will teach you to extract information about data and to apply machine learning algorithms effectively, based on the data type and organisation’s requirements. These courses also cover Spark and Map-Reduce, which are technologies to analyse large datasets, and also how to use git and code version control to keep track of your projects.

Note, that you need a premium subscription to get access to all the courses of this data science track. On the plus side, with a premium membership you will get a monthly call by a mentor who will be your guide, review your resume and give you advice.

**Pros:**

Reasonable Price for the value

Highly Detailed

Covers almost every topic required to become a data scientist.

No prior programming for data science experience required.

Includes big data handling techniques such as Map-Reduce.

Mentorship (Premium)

All guided portfolio projects, i.e., instructions will help you in the project development (Premium)

Resume review (Premium)

Focus on the actual implementation of the math behind machine learning and not just importing a library.

**Cons:**

Chapters like Spark and Map Reduce may be difficult to grasp for beginners.

The certificate is not much worth to employers but will help you build a great portfolio.

**1.4 ****Professional Certificate in Data Science**** - edX (Harvard University) **

**Time Required:**Approx. 1 year and 5 months**Price:**$441.90

**Skills you will gain:**

R programming

Data Visualisation

Mathematics

Productivity Tools

Wrangling

Machine Learning

**Total Number of Courses in this Certification:** 9

R Basics

Data Visualisation

Probability

Inference and Modelling

Productivity Tools

Wrangling

Linear Regression

Machine Learning

Capstone

**About:**

This course covers the basic skills required to be a successful data scientist such as R programming, statistical concepts (probability and inference), data visualisation, data wrangling and helps you to get familiar with the tools required for practising data science such as Unix/Linux, R studio, git and Github. The material of the course is enough to prepare you with the required knowledge base in data science to tackle real-world problems. This course includes real-world case studies to help you understand the data science application a little more. Case studies in this certification are Trends in World Health and Economics, US Crime Rates, The Financial Crisis of 2007-2008, Election Forecasting, Building a Baseball Team (inspired by Moneyball), and Movie Recommendation Systems.

**Pros:**

Equips you will all the basic skills.

Includes real-world case studies.

No prior programming for data science experience required.

Good Hands-on experience

**Cons:**

A very short course in terms of topic covered.

Pricey

**1.5 ****Become a Data Scientist**** - Code Academy **

**Time Required:**Approx. 35 weeks**Price:**$19 per month

**Skills you will gain:**

SQL

Python Programming

Data Analysis

Data Visualisation

Statistics

Web Scraping

Machine Learning

**Total Number of Courses in this Track: **26

The importance of Data and SQL Basics

SQL: Basics

SQL: Intermediate

Go Off-Platform with SQL

Analyse real data with SQL

Python functions and logic

Python Lists and Loops

Advanced Python

Python Cumulative Project

Data Analysis with Python

Data Visualisation

Visualisation Cumulative Projects

Data Visualization Capstone Project

Learn Statistics with Python

Introduction to Statistics with NumPy

Hypothesis Testing with SciPy

Practical Data Cleaning

Data Analysis Capstone Projects

Learn Web Scraping with Beautiful Soup

Machine Learning: Supervised Learning

Supervised Machine Learning Cumulative Project

Machine Learning: Unsupervised Learning

Unsupervised Machine Learning Cumulative Project

Perceptrons and Neural Nets

Machine Learning Capstone Project

Natural Language Processing

**About:**

This track is good to get you started with the skills required for data science. The course contains well-explained content with code-along, guided projects and quizzes. It starts with basic SQL (basic syntax, SQL tables, SQL queries) and Python (functions, loops and data structures). Then moves towards more advanced concepts such as using SQL for analysing real organisation data and using Python for statistics and data cleaning, which will help you tackle real-world data science projects. The course also covers data analysis, data visualisation, statistics and machine learning. The good thing about the certification is that it teaches you all the topics with guided projects and with the Python packages that are important to get hands-on in data science.

This track also teaches machine learning to a very good extent covering supervised and unsupervised learning, neural networks and Natural Language Processing. You will also get hands-on with SciPy, a Python package for hypothesis testing, and Web Scraping with beautiful soup, which is another Python package for parsing HTML and XML documents. It is used to extract data from websites.

**Pros:**

Well-detailed course.

Good value for money.

Easy to use and navigate

No prior programming for data science experience required.

Good Hands-on experience with well-guided projects

**Cons:**

Free courses in this track are a bit too general.

Some topics may seem a bit confusing, for example that around hypothesis testing with SciPy.

**1.6 ****Complete Data Science Bootcamp**** - Udemy **

**Time Required:**Self-paced (29 Hours of video lectures, consider adding extra time to do the exercises)**Price:**$199.99 (After Discount: $10.99)

**Skills you will gain:**

Python programming

Mathematics (Statistics and Probability)

Statistical Analysis

Deep Learning Frameworks

Machine Learning

**Total Number of Chapters in this Certification:** 8

Introduction to Data Science

Probability

Statistics

Introduction to Python

Advance statistical methods with Python

Mathematics

Deep Learning

Case Studies

**About:**

This course is one of the best0selling courses on Udemy with a 4.5-star average user rating and more than 201,049 students enrolled. It starts with the basics (introduction to probability and statistics) and assumes no programming experience. You will learn Python programming and how Python is used in data science. This course also covers a great deal of mathematics which is important for a data scientist, such as the mathematics behind machine learning algorithms, probability distributions, descriptive statistics, inferential statistics, and more. You will also learn how to apply statistical methods with Python on your data. Finally, you will use the acquired skills to solve real-world case studies.

**Pros:**

No programming experience required

Active Q&A support

Access to future updates

Real-world case studies

**Cons:**

Does not cover topics like linear algebra, calculus, data wrangling, and the use of git and Github, which are valuable in data science.

**1.7 ****Data Science A to Z****- Udemy **

**Time Required:**Self-paced (21 hours of video lectures, consider adding extra time to do the exercises)**Price:**$199.99 (After Discount: $10.99)

**Skills you will gain:**

Tableau

SSIS (a component of the Microsoft SQL Server database software that can be used to perform a broad nge of data migration tasks)

Gretl (Open Source statistical package)

SQL for data science

Data Wrangling

Data Visualisation

Communication

**Total Number of Chapters in this Course:** 4

Data Visualisation

Modelling

Data Preparation

Communication

**About:**

This course is a beginner course for data science enthusiasts. You will learn about data mining with Tableau (a data visualisation and analysis tool) and also how to apply machine learning algorithms such as linear regression, logistic regression and various evaluation metrics used to measure the performance of machine learning algorithms and statistical models. You will learn how to handle data with SQL and how to do basic visualisation with Tableau. Along with this you will also learn about data wrangling and manipulation. The added advantage of this course is that it also teaches you how to effectively communicate your project to business people which is one of the important skills required as a data scientist.

**Pros:**

ives you good foundational knowledge about the topics covered.

Teaches you about various frameworks from data science toolbox such as Tableau, SSIS and Gretl.

Teaches about communications which are an essential skill for a data scientist.

**Cons:**

Does not cover the majority of topics required for a data scientist such as basic programming with R and Python, probability, statistics and many machine learning algorithms.

Not a course for beginners who do not have experience with basic programming, statistics and probability.

Not much coding languages covered such as R or Python rather focus was more on data science frameworks.

No real-world projects or case studies.

**2.** **Books**

There are tons of books out there for data science. Some are tool-specific like using R or Python for data science, and some others cover more about the basic skills required for data science skills. These books should be able to get you started with data science projects, and these are the ones we recommend:

**2.1 ****Data Science from Scratch**** - O’Reilly**

**Price: **$26.48

**Content:**

Introduction to data science

Crash Course in Python

Visualizing Data

Linear Algebra

Statistics

Probability

Hypothesis and Inference

Gradient Descent

Getting Data

Working with data

Machine Learning

K-nearest neighbours

Naive Bayes

Simple Linear Regression

Multiple Regression

Logistic Regression

Decision Trees

Neural Networks

Deep Learning

Clustering

Natural Language Processing

Network Analysis

Recommender Systems

Databases and SQL

Map Reduce

Data ethics

Go forth and do data science

**About:**

This book is for people with some knowledge of programming (in any language), but Python is not a prerequisite as it starts with a crash course in Python. Most of the book is focused on Machine Learning algorithms, providing a good understanding of these algorithms along with their implementations. This book teaches the concepts, that is, the various machine learning algorithms from scratch, without using fancy Python libraries like Scikit-learn but rather implementing the algorithm manually. Therefore, it is a good book to learn about the actual working of various machine learning algorithms. The book also covers many of the basic topics for data science such as Python programming and data cleaning and visualisation, and you will be able to get your hands dirty with some data science projects.

**2.2 ****Doing Data Science**** - O’Reilly**

**Price:** $32.99

**Content:**

Introduction to Data Science

Statistical Inference, Exploratory Data Analysis, and the Data Science Process

Algorithms

Spam Filters, Naive Bayes, and Wrangling

Logistic Regression

Time Stamps and Financial Modelling

Extracting Meaning from Data

Recommendation Engines: Building a User-Facing Data Product at Scale

Data Visualization and Fraud Detection

Social Networks and Data Journalism

Causality

Epidemiology

Lessons Learned from Data Competitions: Data Leakage and Model Evaluation

Data Engineering: MapReduce, Pregel, and Hadoop

The Students Speak

Next-Generation Data Scientists, Hubris, and Ethics

**About:**

*Doing Data Science* from O’Reilly is another great book for data science beginners. It contains a good balance between statistics and machine learning. In this book, you will find various data science case studies from data scientists from Google, Microsoft and Ebay, as well as, sample code and exercises to help you learn the concepts. It also covers topics like Map-Reduce, Hadoop, Pregel (a system for large scale graph processing) which are not covered in most other books or online courses. The initial chapters will help you gain the skills required for data exploration and making inferences from data. Middle chapters are more focused on machine learning, recommendation systems, data visualisation and model evaluation techniques. Later chapters teach about Data Engineering, an aspect of data science that focuses on the practical application of data collection and analysis, which is a very important topic in data science, especially when you are handling enormous amounts of data. This book is written from a statistical perspective and is enough to get you started with data science.

**2.3 ****The Art of Data Science**** - ****Roger D. Peng**** and ****Elizabeth Matsui**

**Price: **Free

**Content:**

Data Analysis as Art

Epicycles of Analysis

Stating and Refining the Question

Exploratory Data Analysis

Using Models to Explore Your Data

Inference: A Primer

Formal Modeling

Inference vs. Prediction: Implications for Modeling Strategy

Interpreting Your Results

Communication

**About:**

This book was written as a side tool for the Coursera John Hopkins specialization. It is a high-level overview of the data science workflow. It is a good book for students who have no practical experience in data science. The book contains R programming language code snippets to help you along with the chapters. This book does a great job in explaining data analysis and how to interpret results. Though it lacks some of the important topics of data science, it is useful for beginners to learn about data science and the common pipeline used when working on a data science project.

**3. Additional Resources**

Here are some additional resources that can be helpful for you in your data science journey.

A comprehensive overview of data science covering data analytics, programming and business skills necessary to master the discipline.

**The Open Source Data Science Masters**

It is an open-source collection of books, courses and articles to learn data science, with content around both the theory and technologies of data science.

Collection of data science courses created by Harvard University.