Comprehensive Data Exploration Process with One-Click

Exploratory Data Analysis, also known as EDA, has become an increasingly hot topic in data science. Just as the name suggests, it is the process of trial and error in an uncertain space, with the goal of finding insights. It usually happens at the early stage of the data science lifecycle. Although there is no clear-cut between the definition of data exploration, data cleaning, or feature engineering. EDA is generally found to be sitting right after the data cleaning phase and before feature engineering or model building. EDA assists in setting the overall direction of model selection and it helps…


Line chart, bar chart, pie chart … they tell different stories

In this information rich age, data visualizations are designed to make the knowledge transfer between deliverers and receivers easier. Therefore, it is crucial for the dashboard creators to know which chart is aligned with the key delivery objectives. On the other hand, having a basic understanding of the underlying meaning of each chart also helps the audience to interpret dashboards effectively. In this article, I introduced a way that may help to better understand some common charts and graphs, e.g. scatter plot, map, pie graph and stacked bar chart etc, by categorising them into four main types: distribution, comparison, composition…


What if Learning Data Science is a Game

We are all familiar with the modern game design, that champions or heroes are always equipped with certain attributes and specialties. For example, Dota heroes are scored based on the aspects of agility, intelligence, and strength. To excel on the battlefield, the hero needs to have above-average scores among all attributes while additionally specialized in at least one.

So what if we think of learning data science as playing a game where all of us possess multi-dimensional abilities. Playing video games demands constantly sharpening our skills with weapons, training, or magic potion. …


Step-by-Step Guide from Data Preprocessing to Model Evaluation

What is Logistic Regression?

Don’t let the name logistic regression tricks you, it usually falls under the category of the classification algorithm instead of regression algorithm.

Then, what is a classification model? Simply put, the prediction generated by a classification model would be a categorical value, e.g. cat or dog, yes or no, true or false … On the contrary, a regression model would predict a continuous numeric value.

Logistic regression makes predictions based on the Sigmoid function which is a squiggles-like line as shown below. …


Machine Learning and Predictive Modelling in BigQuery

While taking the first step into the field of machine learning, it is so easy to get overwhelmed by all kinds of complex algorithms and ugly symbols. Therefore, hopefully, this article can lower the entry barrier by providing a beginner-friendly guide. Allow you to get a sense of achievement by building your own ML model using BigQuery and SQL. That’s right, we can use SQL to implement machine learning. If you are looking for several lines of code to get your hands dirty in the ML field, please continue reading :)

1. Set Up the Basic Structure 📁


Sites and blogs that inspire learning.

Learning data science is a long journey, following a rigid course curriculum inevitably makes learning a mundane task. Therefore, I have compiled a list of data science blogs that are able to bring you daily does of inspiration on various domains: AI and Machine Learning, Data Engineering, Data Visualization, and Business Acumen.

I have created an infographic as a summary, feel free to steal it at the end of this article. Additionally, if you are looking for data science podcasts or YouTubers to follow, have a read of the lists I collected :).

AI & Machine Learning

1. Towards Data Science

Towards Data Science gathers a large community…


Learn left join, inner join, self join using examples

To perform advanced analytical processing and data discovery, one table is often not enough to bring valuable insights, hence combining multiple tables together is unavoidable. SQL, as a tool to communicate with relational database, provides the functionality to build relationships among tables. This article introduces how to use SQL to link tables together. If you want to learn more about the basics of SQL, I suggest have a read of my first article about learning SQL in everyday language. It gives a comprehensive SQL introduction for absolute beginners.

Why We Need to Learn SQL JOIN

Maybe you haven’t even realized, we frequently come across joining in Excel…


Your Daily Dose of Inspiration When Unmotivated to Learn Data Science

If we only learn data science through a rigid curriculum created by universities or online courses from Coursera or Udemy, we may find the learning process too boring. If you ever find yourself losing motivation in this long journey of studying data science, you may just need some podcasts to break the routine and get some inspiration. The major difference between these two approaches of learning is that the former focuses on theory and concepts, whereas the latter introduces more practical experience and projects that add flesh to the bones.

Listening to podcasts is a great way to absorb knowledge…


SQL is Just Like Excel

What is SQL After All?

SQL stands for “Structured Query Language”, so it is a programming language just like python, java or R. What it differs from most common language is that it is a type of declarative language, which means it tells computers what to do instead of how to do it. As its name suggests, SQL is the language used to communicate with the database for the purpose of requesting and extracting the data we want. Let’s first understand what is a database? …


Solutions to Three Types of Missing Data

Missing data is one of the most common data quality issues among three most common issues: Missing Value, Duplicated Value and Inconsistent Value.

  1. Missing value is the easiest one to identify, it may be in various forms, e.g. null values, blank space or being represented as “unknown”. Apply a filter to data can make missing values more easily identified.
  2. Duplicate value occurs when several rows of data appear to be the same then most likely that they have been mistakenly recorded multiple times.
  3. Inconsistent value usuallyoccurs when the string values of the same attributes do not follow the same naming…

Destin Gong

on my way to become a data storyteller

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store