With Sample Code for Enhancements to Inspire Your Charting Creativity

Note: The code for this post can be found here

Image for post
Image for post
Different Approaches to Visualizing a Distribution (Image by Author)

Understanding how the variables are distributed in the data is an important step and should happen early in the Exploratory Data Analysis (EDA) process. There are a number of tools available to analyze the distribution of data. Visualization aids are likely the most popular because a well constructed chart can quickly answer important questions regarding the data. For example:

  • What are the central tendencies? mean, median, and mode(s).


Explore the Motivation and Capabilities of ML Through the Game of Rock Paper Scissors

Note: The code for this post can be found here

Image for post
Image for post
Machine Learning with the Rock Paper Scissors Game (Image by Author)

In this article, we’re going to build a simple Rock Paper Scissors game in Python with two different approaches: Rules-Based System vs. Machine Learning. Through this comparison, I hope to express how Machine Learning works and its motivations. To be clear, this is not an illustration of Image Recognition or Pattern Recognition (which hand will a player choose next) but rather Machine Learning concepts in general. As automation continues to revolutionizing the future of work across industries, companies must explore different ways to streamline their operations. …


Guided Example of Model Deployment using Python and Flask

Note: The code for this post can be found here

Image for post
Image for post
Photo by mohamed_hassan on Pixabay

The last step in the Machine Learning Life Cycle is to put the model into production, also known as “operationalizing” the model. It often means enabling the model to generate outputs based on new data given. In the context of a real-world application, to deploy the Machine Learning model into production is to integrate it into the existing environment, allowing other systems to call it for making inferences. …


Tutorial on Applying Functions to Pandas DataFrames

Image for post
Image for post
Photo by picjumbo_com on Pixabay

Feature Engineering is an important step in the Data Science workflow. It is the process of extracting features from raw data using data mining techniques and domain knowledge. This can involve performing transformations or univariate, binary, and multivariate statistical analysis on existing data. These derived values can make data more intuitive for analysts and their algorithms. An experienced practitioner can quickly assess the problem and brainstorm new features to create based on existing data. These features, ranging in complexity, may require calculations that are easily done (and more readable) using Lambda functions. …


Guided Example to Set Up an API using Python and Flask to Make Data Accessible to Users

Note: The code for this post can be found here

Image for post
Image for post
Photo by geralt on Pixabay

API stands for Application Programming Interface. It is a software intermediary that allows systems to communicate with each other. Most businesses online have likely built APIs for customers and/or for internal use. For example, when a user enters a URL into their browser, e.g. www.medium.com, they are making a request to Medium’s server. Medium will then give back a response to be interpreted and displayed on the user’s browser. Modern client-to-server communications are mostly handled by APIs. The type and response will be dependent on a set of dedicated URLs


10 Guided Examples to Help You Understand Regex

Image for post
Image for post
Photo by Edward Howell on Unsplash

Regular Expressions (Regex) is an essential tool for text analytics. It is powerful in searching and manipulating text strings. Compared to the traditional approach for processing strings with a combination of loops and conditionals, one line of regex can replace many lines of code. Some well known use cases for such text processing include:

  • Input Validation such as password, email, name, etc.

In this article, we’ll go through the basics on how to use Regex. The focus is primarily on defining the correct patterns for the…


Making the Most out of Decision Trees

<You can find the code used for demonstrate here>

Tree-based classification models are a type of supervised machine learning algorithm that uses a series of conditional statements to partition training data into subsets. Each successive split adds some complexity to the model, which can be used to make predictions. The end result model can be visualized as a roadmap of logical tests that describes the data set. Decision trees are popular for small-to-medium-sized data sets because they are easy to implement and even easier to interpret. However, they are not without challenges. …


A Study on How GPUs Accelerate the Data Science Workflow

<Code for this article can be found here>

Image for post
Image for post
Photo by Pixabay on pexels

Are GPUs faster than CPUs? It’s a very loaded question, but the short answer is no, not always. In fact, for most general purpose computing, a CPU performs much better than a GPU. That’s because CPUs are designed with fewer processor cores that have higher clock speeds than the ones found on GPUs, allowing them to complete series of tasks very quickly. GPUs, on the other hand, have much greater number of cores and are designed for a different purpose. At inception, GPU was originally designed to accelerate the performance of graphics…


How Social Media Companies Use Machine Learning to Fight Fake News

Image for post
Image for post
Photo by TheDigitalArtist on pixabay

With more people now than ever relying on social media to stay updated on current events, there is an ethical responsibility for hosting companies to defend against false information. Disinformation, which is a type of misinformation that is intended to manipulate and mislead, can create unrest and panic. Other types of misinformation such as rumors and hoaxes, if left unchecked, also has the potential to bring mental and physical harm to unwary readers. The key to stopping the spread of misinformation is taking swift action against them since they have the tendency to travel very quickly. In fact, studies show…


What are the Most Frequently Discussed Topics in the News by Category?

<Complete code for this demonstration can be found here>

This is a continuation of the Fun with NLP series — Natural Language Processing is a fast growing field in Machine Learning that makes it possible for computers to read, hear, speak, interpret, generate human language. In the Fun with NLP series — I demonstrate simple tricks that I used to analyze text data. For demonstration purposes, I’m using data from the NYT Archive API to explore the question:

What are the most frequently discussed topics in the news…

Kevin C Lee

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store