What is data science? Learning the basics


introduction-to-data-science What is data science? Learning the basics

What is data science? What does a data scientist do? These are fundamental questions we will be talking about today. Data science is a multidisciplinary discipline with the prime objective of extracting value in all its forms from data. Data science is a process since you need to dig into the data processing stages, from munging data sources and data cleaning to machine learning and finally visualization, you see that specific steps are involved in turning raw data into insight.

Today is my 5th week of #StayHome due to COVID-19 and we are getting used to this new reality – new normal. As we are finding out, lockdown is not going to be temporary by any means, and we might be confined to working from home for several months…… no idea. we all can just pray and hope it will get over quickly and we will go back to our normal life.

Data science shows patterns and observations that can be used by companies to make better choices and develop more creative products and services. Data is the foundation stone of innovation, but its value comes from the knowledge scientists who can learn from it and then act on it.

The following info-graphic from Harvard professors Joe Blitzstein and Hanspeter Pfister outlines a typical data science process, which will help us answer these questions

data-science-process What is data science? Learning the basics

Data science is all about the extraction of information from data. Data science uses statistics and computer science approaches to analyze the vast amount of data that we produce in our daily lives and to turn it into something meaningful and useful. Every time you buy something with your credit card, every time you visit a website, everybody generates information with our lives increasingly digitized, constantly on smartphones, electronic payments, we’ve all become data-generating machines.

Data comes in many ways and falls into three groups at a high level:

  • structured,
  • semi-structured,
  • and unstructured

Structured data is highly organized data that exists within a repository such as a database or CSV/excel files. The data is easily accessible, and the format of the data makes it appropriate for queries. Unstructured data lacks any content structure at all (for example, an audio stream or natural language text). Unstructured data is the most useful form of data because it can be immediately manipulated.

Now lets go to Venn Digram, the famous way to explain data science.

Venn diagram (also called primary diagramset diagram or logic diagram) is a diagram that shows all possible logical relations between a finite collection of different sets. These diagrams depict elements as points in the plane, and sets as regions inside closed curves. A Venn diagram consists of multiple overlapping closed curves, usually circles, each representing a set. The points inside a curve labelled S represent elements of the set S, while points outside the boundary represent elements not in the set S.  Video to understand,

Video on Understanding Venn Diagrams – 1st Grade Math

7W0VM What is data science? Learning the basics
Drew Conway’s Data Science Venn Diagram

David Taylor wrote an excellent article on these Venn diagrams entitled, Battle of the Data Science Venn Diagrams. You should read it.

You’ll Learn:

A Beginner’s Guide To Data Science …

  1. Introduction to Data Science
  2. What is Data Science process?
  3. Types of data
  4. What Data Scientist do?

Resources:

To share your thoughts:

  • Leave a comment on the section below on this post
  • You want to suggest any new topic we should cover in future Podcast
  • Join us in Mastermind tribe
  • Share this on TwitterFacebook, If you enjoyed this episode and we together are learning new Technologies.

To help out this initiative:

  • Leave a candid review for the OTechTalks Podcast on iTunes! Your ratings and reviews will help the session on iTunes.
  • Subscribe to the Podcast on iTunes to get next sessions