• Soledad Galli

Best Resources to Learn Data Science


You must have heard about data science a lot nowadays. And why not? It is one of the hottest jobs of the 21st century.


Do you want to be a data scientist? There are tons of courses available online that will help you to get started. But which one to choose?

In this blog we recommend some great courses and books that will help you become a data scientist. For our recommendation, we considered the cost of the course and the knowledge you get from the course, prioritising those that can be taken for free.



What is Data Science?


Data Science is a multidisciplinary field that uses algorithms, statistics and scientific methods to extract knowledge and insights from structured and unstructured data. It takes concepts and theories from multiple fields such as computer science, mathematics, machine learning, statistics and information science to solve complex problems.





Data Scientist is a term used for data professionals who are skilled in data science, i.e., in organising, analysing and drawing inferences from large amounts of data. These professionals can identify problems and relevant questions, collect and organise data from various sources to create solutions and then communicate them effectively to organisations.



What are the skills required to become a data scientist?


  • Programming - Languages like R and Python

  • Statistics –Forecasting, Descriptive and Inferential Statistics

  • Machine Learning algorithms - E.g., Regression, Decision Trees, clustering

  • Data visualisation - tools like Tableau, Excel and also Python and R

  • Database Handling- SQL, NoSQL, among others

  • Software Engineering

  • Data Wrangling (Process of cleaning, restructuring and enriching the raw data into a usable format)


Acquiring these skills will certainly take you some time, but fortunately, there are many resources available online to get you started. How to select the right course? How and where to start? This is exactly what we will discuss in this blog.


Learn Data Science online


We will cover the following courses and books in this blog, along with some additional resources.


Disclaimer: Opinions stated in the article are my own and I do not become financially compensated by any of the links in this blog. This blog does not contain affiliate links.
Prices are indicative and they may vary based on location and their platform terms and conditions



Contents


1. Courses

  1. Coursera - Data Science Specialization by Johns Hopkins University

  2. Coursera - Data Science Professional Certificate - IBM

  3. Path Data Scientist - DataQuest

  4. edX - Professional Certificate in Data Science - Harvard University

  5. Code Academy - Become a Data Scientist

  6. Udemy - Complete Data Science Bootcamp

  7. Udemy - Data Science A-Z


2. Books

  1. O’Reilly - Data Science from Scratch

  2. O’Reilly - Doing Data Science

  3. Roger D. Peng and Elizabeth Matsui - The art of Data Science


3. Additional Resources

  1. · The Data Science Handbook

  2. · The Open Source Data Science Masters

  3. · Harvard University - Online Data Science Courses




1. Courses


There are many online courses to help you get the right skills for data science. You can find these courses on E-learning websites like Coursera, edX and Udemy. One good thing about online courses is they can be highly interactive which could help you understand and remember concepts easily.


To be a successful data scientist there are some skills and knowledge you need to acquire. These are:





Now let’s dive in and learn about some excellent online courses out there.



1.1 Data Science Specialisation - Coursera (Johns Hopkins University)


  • Time Required: Approx. 8 Months, but you can do it quicker or slower if you don’t want a certificate.

  • Price: $49 USD per month, if you want to have a certificate else it’s free.


Skills you will gain:

  • R programming

  • Use of Github

  • Data handling and Exploratory Data Analysis

  • Machine Learning

  • How to make statistical inference

  • Regression Analysis

  • How to build data products


Total Number of Courses in this Specialization: 10

  1. The Data Scientist ToolBox

  2. R Programming

  3. Getting and Cleaning Data

  4. Exploratory Data Analysis

  5. Reproducible Research

  6. Statistical Inference

  7. Regression Models

  8. Practical Machine Learning

  9. Developing Data Products

  10. Data Science Capstone


About:


This Specialisation covers almost every aspect of the data science pipeline. This Specialisation provides both a conceptual and a practical introduction to data science, focusing on tools and concepts you require to build a successful data science project, like R programming, gathering data from various sources like the web and databases, and how to present data. It also teaches the use of tools like Github, which is a platform for discovering, sharing and building software. Data Scientists use Github for collaboration and to track changes in their work, they can even roll back changes if required. You also learn how to make inferences from the data, ask data-related questions and apply the skills you learn to solve a data-related problem.


To complete this Specialisation you need to finish all the courses and finally submit a capstone project where you will apply the skills that you gained to a real-world problem (note that capstone projects are only available if you pay the subscription).


The Specialisation uses R as the programming language. You will learn about R from scratch, both general programming principles and how to use R for data science and statistical computing. The courses will also teach you how to gather data from various sources and how to process data so that it can be used in machine learning projects.


Later courses in the Specialisation teach exploratory data analysis, helping you summarise your data with visualisation and exploratory techniques. They also cover the required tools and concepts required for making statistical inference from data, learning about regression models and using practical machine learning algorithms with R. Finally you will learn about building data products that will help you tell the story derived from the data. After you learn these skills, you can submit a capstone project. These projects are based on real-world problems and are conducted together with industry, government and academic partners.


Pros:

  • Good course for beginners.

  • No prior programming experience required.

  • Less shy with mathematics than other courses.

  • It covers most skills required to tackle a data science project, helping you get started in the field of data science.

  • Capstone Projects allow you to apply your skills to a real-world problem, giving you an overview of the entire data science pipeline.

Cons:

  • Basic understanding of statistics and probability required.



1.2 Data Science Professional Certificate - Coursera (IBM)


  • Time Required: Approx. 3 Months

  • Price: $39 USD per month, if you want a certificate else it’s free.


Skills you will gain:

  • Python programming

  • Data visualisation

  • Machine Learning

  • Data analysis

  • Databases and SQL


Total Number of Courses in this Certificate: 9

  1. What is Data Science

  2. Data Science Open Source Tools

  3. Data Science Methodology

  4. Python for data science and AI

  5. Database and SQL for Data Science

  6. Data Analytics with Python

  7. Data Visualization with Python

  8. Machine Learning with Python

  9. Data Science Capstone


About:


This is a beginner level course on Data Science. This set of courses will give the basic skills that will help you get started with data science. It describes the open-source tools available and commonly used in data science like R Studio, Apache Zeppelin and Jupyter Notebook. The courses also teach Python programming, use of databases, and concepts around data analysis, data visualisation and machine learning. You will dive into Machine Learning algorithms, like Linear and Logistic Regression, K-Nearest Neighbours,, Decision Trees, Support Vector Machines, Partitioned-based Clustering and Hierarchical Clustering. Finally, you will also get hands-on on the IBM cloud to build a Data Science capstone project.


Pros:

  • The content is well designed.

  • Good hands-on working on fundamental concepts.

  • No prior programming experience required.


Cons:

  • The course is just a beginner course, so a lot more depth is required in these topics.



1.3 Path Data Scientist - DATAQUEST

  • Time Required: Self-paced

  • Price: $29 USD per month (basic) and $49 USD per month (premium)


Skills you will gain:

  • Python programming for Data Science

  • Data Visualisation

  • Data cleaning and data analysis

  • Databases and SQL

  • Web Scraping

  • Concepts on Statistics, Probability and Linear Algebra

  • Machine Learning

  • Data Structures and Algorithms

  • Git and Version Control

  • Spark and Map Reduce


Total Number of Courses in this Track: 35

  1. Python for Data Science and Fundamentals

  2. Python for Data Science Intermediate

  3. Pandas and NumPy Fundamentals

  4. Exploratory Data Visualisation

  5. Storytelling through Data Visualisation

  6. Data Cleaning and Analysis

  7. Data Cleaning in Python - Advanced

  8. Data Cleaning Project Walk Through

  9. Elements of the command line

  10. Text Processing in the command line

  11. SQL fundamentals

  12. SQL Intermediate

  13. SQL Advanced

  14. APIs and Web Scraping

  15. Statistics and Fundamentals

  16. Statistics Intermediate

  17. Probability Fundamentals

  18. Conditional Probability

  19. Hypothesis testing and fundamentals

  20. Machine Learning Fundamentals

  21. Calculus for Machine Learning

  22. Linear Algebra for Machine Learning

  23. Machine Learning with Python - Intermediate

  24. Decision Trees: Python

  25. Deep Learning Fundamentals

  26. Machine Learning Project

  27. Kaggle Fundamentals

  28. Exploring topics in Data Science

  29. Natural Language Processing

  30. Functions: Advance

  31. Data Structures and Algorithms

  32. Python programming advance

  33. Command-line: Intermediate

  34. Git and Version Control

  35. Spark and Map Reduce


About:


This track is one of the best you can find online to learn about data science. The courses are incredibly detailed. These courses will teach you each and every aspect required to become a data scientist. The courses will teach you Python fundamentals and more advanced concepts, Data Analysis, Data Visualisation and how to query Databases with SQL. They also dive into mathematics a lot more than other courses. You will learn about statistics, linear algebra, calculus and probability. The later part of this track covers machine learning algorithms in detail, data structures and a bit on Natural Language Processing. These chapters will teach you to extract information about data and to apply machine learning algorithms effectively, based on the data type and organisation’s requirements. These courses also cover Spark and Map-Reduce, which are technologies to analyse large datasets, and also how to use git and code version control to keep track of your projects.


Note, that you need a premium subscription to get access to all the courses of this data science track. On the plus side, with a premium membership you will get a monthly call by a mentor who will be your guide, review your resume and give you advice.


Pros:

  • Reasonable Price for the value

  • Highly Detailed

  • Covers almost every topic required to become a data scientist.

  • No prior programming for data science experience required.

  • Includes big data handling techniques such as Map-Reduce.

  • Mentorship (Premium)

  • All guided portfolio projects, i.e., instructions will help you in the project development (Premium)

  • Resume review (Premium)

  • Focus on the actual implementation of the math behind machine learning and not just importing a library.


Cons:

  • Chapters like Spark and Map Reduce may be difficult to grasp for beginners.

  • The certificate is not much worth to employers but will help you build a great portfolio.



1.4 Professional Certificate in Data Science - edX (Harvard University)

  • Time Required: Approx. 1 year and 5 months

  • Price: $441.90


Skills you will gain:

  • R programming

  • Data Visualisation

  • Mathematics

  • Productivity Tools

  • Wrangling

  • Machine Learning


Total Number of Courses in this Certification: 9

  1. R Basics

  2. Data Visualisation

  3. Probability

  4. Inference and Modelling

  5. Productivity Tools

  6. Wrangling

  7. Linear Regression

  8. Machine Learning

  9. Capstone


About:


This course covers the basic skills required to be a successful data scientist such as R programming, statistical concepts (probability and inference), data visualisation, data wrangling and helps you to get familiar with the tools required for practising data science such as Unix/Linux, R studio, git and Github. The material of the course is enough to prepare you with the required knowledge base in data science to tackle real-world problems. This course includes real-world case studies to help you understand the data science application a little more. Case studies in this certification are Trends in World Health and Economics, US Crime Rates, The Financial Crisis of 2007-2008, Election Forecasting, Building a Baseball Team (inspired by Moneyball), and Movie Recommendation Systems.


Pros:

  • Equips you will all the basic skills.

  • Includes real-world case studies.

  • No prior programming for data science experience required.

  • Good Hands-on experience

Cons:

  • A very short course in terms of topic covered.

  • Pricey



1.5 Become a Data Scientist - Code Academy

  • Time Required: Approx. 35 weeks

  • Price: $19 per month


Skills you will gain:

  • SQL

  • Python Programming

  • Data Analysis

  • Data Visualisation

  • Statistics

  • Web Scraping

  • Machine Learning


Total Number of Courses in this Track: 26

  1. The importance of Data and SQL Basics

  2. SQL: Basics

  3. SQL: Intermediate

  4. Go Off-Platform with SQL

  5. Analyse real data with SQL

  6. Python functions and logic

  7. Python Lists and Loops

  8. Advanced Python

  9. Python Cumulative Project

  10. Data Analysis with Python

  11. Data Visualisation

  12. Visualisation Cumulative Projects

  13. Data Visualization Capstone Project

  14. Learn Statistics with Python

  15. Introduction to Statistics with NumPy

  16. Hypothesis Testing with SciPy

  17. Practical Data Cleaning

  18. Data Analysis Capstone Projects

  19. Learn Web Scraping with Beautiful Soup

  20. Machine Learning: Supervised Learning

  21. Supervised Machine Learning Cumulative Project

  22. Machine Learning: Unsupervised Learning

  23. Unsupervised Machine Learning Cumulative Project

  24. Perceptrons and Neural Nets

  25. Machine Learning Capstone Project

  26. Natural Language Processing


About:


This track is good to get you started with the skills required for data science. The course contains well-explained content with code-along, guided projects and quizzes. It starts with basic SQL (basic syntax, SQL tables, SQL queries) and Python (functions, loops and data structures). Then moves towards more advanced concepts such as using SQL for analysing real organisation data and using Python for statistics and data cleaning, which will help you tackle real-world data science projects. The course also covers data analysis, data visualisation, statistics and machine learning. The good thing about the certification is that it teaches you all the topics with guided projects and with the Python packages that are important to get hands-on in data science.


This track also teaches machine learning to a very good extent covering supervised and unsupervised learning, neural networks and Natural Language Processing. You will also get hands-on with SciPy, a Python package for hypothesis testing, and Web Scraping with beautiful soup, which is another Python package for parsing HTML and XML documents. It is used to extract data from websites.


Pros:

  • Well-detailed course.

  • Good value for money.

  • Easy to use and navigate

  • No prior programming for data science experience required.

  • Good Hands-on experience with well-guided projects

Cons:

  • Free courses in this track are a bit too general.

  • Some topics may seem a bit confusing, for example that around hypothesis testing with SciPy.



1.6 Complete Data Science Bootcamp - Udemy

  • Time Required: Self-paced (29 Hours of video lectures, consider adding extra time to do the exercises)

  • Price: $199.99 (After Discount: $10.99)


Skills you will gain:

  • Python programming

  • Mathematics (Statistics and Probability)

  • Statistical Analysis

  • Deep Learning Frameworks

  • Machine Learning


Total Number of Chapters in this Certification: 8

  1. Introduction to Data Science

  2. Probability

  3. Statistics

  4. Introduction to Python

  5. Advance statistical methods with Python

  6. Mathematics

  7. Deep Learning

  8. Case Studies


About:


This course is one of the best0selling courses on Udemy with a 4.5-star average user rating and more than 201,049 students enrolled. It starts with the basics (introduction to probability and statistics) and assumes no programming experience. You will learn Python programming and how Python is used in data science. This course also covers a great deal of mathematics which is important for a data scientist, such as the mathematics behind machine learning algorithms, probability distributions, descriptive statistics, inferential statistics, and more. You will also learn how to apply statistical methods with Python on your data. Finally, you will use the acquired skills to solve real-world case studies.


Pros:

  • No programming experience required

  • Active Q&A support

  • Access to future updates

  • Real-world case studies

Cons:

  • Does not cover topics like linear algebra, calculus, data wrangling, and the use of git and Github, which are valuable in data science.



1.7 Data Science A to Z- Udemy

  • Time Required: Self-paced (21 hours of video lectures, consider adding extra time to do the exercises)

  • Price: $199.99 (After Discount: $10.99)


Skills you will gain:

  • Tableau

  • SSIS (a component of the Microsoft SQL Server database software that can be used to perform a broad nge of data migration tasks)

  • Gretl (Open Source statistical package)

  • SQL for data science

  • Data Wrangling

  • Data Visualisation

  • Communication

Total Number of Chapters in this Course: 4

  1. Data Visualisation

  2. Modelling

  3. Data Preparation

  4. Communication


About:


This course is a beginner course for data science enthusiasts. You will learn about data mining with Tableau (a data visualisation and analysis tool) and also how to apply machine learning algorithms such as linear regression, logistic regression and various evaluation metrics used to measure the performance of machine learning algorithms and statistical models. You will learn how to handle data with SQL and how to do basic visualisation with Tableau. Along with this you will also learn about data wrangling and manipulation. The added advantage of this course is that it also teaches you how to effectively communicate your project to business people which is one of the important skills required as a data scientist.



Pros:

  • ives you good foundational knowledge about the topics covered.

  • Teaches you about various frameworks from data science toolbox such as Tableau, SSIS and Gretl.

  • Teaches about communications which are an essential skill for a data scientist.


Cons:

  • Does not cover the majority of topics required for a data scientist such as basic programming with R and Python, probability, statistics and many machine learning algorithms.

  • Not a course for beginners who do not have experience with basic programming, statistics and probability.

  • Not much coding languages covered such as R or Python rather focus was more on data science frameworks.

  • No real-world projects or case studies.



2. Books



There are tons of books out there for data science. Some are tool-specific like using R or Python for data science, and some others cover more about the basic skills required for data science skills. These books should be able to get you started with data science projects, and these are the ones we recommend:



2.1 Data Science from Scratch - O’Reilly


Price: $26.48


Content:

  1. Introduction to data science

  2. Crash Course in Python

  3. Visualizing Data

  4. Linear Algebra

  5. Statistics

  6. Probability

  7. Hypothesis and Inference

  8. Gradient Descent

  9. Getting Data

  10. Working with data

  11. Machine Learning

  12. K-nearest neighbours

  13. Naive Bayes

  14. Simple Linear Regression

  15. Multiple Regression

  16. Logistic Regression

  17. Decision Trees

  18. Neural Networks

  19. Deep Learning

  20. Clustering

  21. Natural Language Processing

  22. Network Analysis

  23. Recommender Systems

  24. Databases and SQL

  25. Map Reduce

  26. Data ethics

  27. Go forth and do data science


About:


This book is for people with some knowledge of programming (in any language), but Python is not a prerequisite as it starts with a crash course in Python. Most of the book is focused on Machine Learning algorithms, providing a good understanding of these algorithms along with their implementations. This book teaches the concepts, that is, the various machine learning algorithms from scratch, without using fancy Python libraries like Scikit-learn but rather implementing the algorithm manually. Therefore, it is a good book to learn about the actual working of various machine learning algorithms. The book also covers many of the basic topics for data science such as Python programming and data cleaning and visualisation, and you will be able to get your hands dirty with some data science projects.



2.2 Doing Data Science - O’Reilly


Price: $32.99



Content:

  1. Introduction to Data Science

  2. Statistical Inference, Exploratory Data Analysis, and the Data Science Process

  3. Algorithms

  4. Spam Filters, Naive Bayes, and Wrangling

  5. Logistic Regression

  6. Time Stamps and Financial Modelling

  7. Extracting Meaning from Data

  8. Recommendation Engines: Building a User-Facing Data Product at Scale

  9. Data Visualization and Fraud Detection

  10. Social Networks and Data Journalism

  11. Causality

  12. Epidemiology

  13. Lessons Learned from Data Competitions: Data Leakage and Model Evaluation

  14. Data Engineering: MapReduce, Pregel, and Hadoop

  15. The Students Speak

  16. Next-Generation Data Scientists, Hubris, and Ethics



About:


Doing Data Science from O’Reilly is another great book for data science beginners. It contains a good balance between statistics and machine learning. In this book, you will find various data science case studies from data scientists from Google, Microsoft and Ebay, as well as, sample code and exercises to help you learn the concepts. It also covers topics like Map-Reduce, Hadoop, Pregel (a system for large scale graph processing) which are not covered in most other books or online courses. The initial chapters will help you gain the skills required for data exploration and making inferences from data. Middle chapters are more focused on machine learning, recommendation systems, data visualisation and model evaluation techniques. Later chapters teach about Data Engineering, an aspect of data science that focuses on the practical application of data collection and analysis, which is a very important topic in data science, especially when you are handling enormous amounts of data. This book is written from a statistical perspective and is enough to get you started with data science.



2.3 The Art of Data Science - Roger D. Peng and Elizabeth Matsui


Price: Free


Content:

  1. Data Analysis as Art

  2. Epicycles of Analysis

  3. Stating and Refining the Question

  4. Exploratory Data Analysis

  5. Using Models to Explore Your Data

  6. Inference: A Primer

  7. Formal Modeling

  8. Inference vs. Prediction: Implications for Modeling Strategy

  9. Interpreting Your Results

  10. Communication


About:


This book was written as a side tool for the Coursera John Hopkins specialization. It is a high-level overview of the data science workflow. It is a good book for students who have no practical experience in data science. The book contains R programming language code snippets to help you along with the chapters. This book does a great job in explaining data analysis and how to interpret results. Though it lacks some of the important topics of data science, it is useful for beginners to learn about data science and the common pipeline used when working on a data science project.



3. Additional Resources


Here are some additional resources that can be helpful for you in your data science journey.


The Data Science Handbook

A comprehensive overview of data science covering data analytics, programming and business skills necessary to master the discipline.


The Open Source Data Science Masters

It is an open-source collection of books, courses and articles to learn data science, with content around both the theory and technologies of data science.


Harvard Data Science Courses

Collection of data science courses created by Harvard University.

STAY UP TO DATE

Get the latest tutorials, releases and demos!

Privacy statement: By providing us with your email address, you are giving us permission to contact you with news related to our courses, books, open-source packages, and related notifications.

We will not share your information with third-parties. You can unsubscribe anytime. For more info, read our full Privacy Policy.

© 2018 - 2020 Train In Data

  • YouTube - Grey Circle
  • Soledad Galli - Twitter
  • LinkedIn - Grey Circle