Explore the Motivation and Capabilities of ML Through the Game of Rock Paper Scissors

Note: The code for this post can be found here

Image for post
Image for post
Machine Learning with the Rock Paper Scissors Game (Image by Author)

In this article, we’re going to build a simple Rock Paper Scissors game in Python with two different approaches: Rules-Based System vs. Machine Learning. Through this comparison, I hope to express how Machine Learning works and its motivations. To be clear, this is not an illustration of Image Recognition or Pattern Recognition (which hand will a player choose next) but rather Machine Learning concepts in general. As automation continues to revolutionizing the future of work across industries, companies must explore different ways to streamline their operations. …


Guided Example of Model Deployment using Python and Flask

Note: The code for this post can be found here

Image for post
Image for post
Photo by mohamed_hassan on Pixabay

The last step in the Machine Learning Life Cycle is to put the model into production, also known as “operationalizing” the model. It often means enabling the model to generate outputs based on new data given. In the context of a real-world application, to deploy the Machine Learning model into production is to integrate it into the existing environment, allowing other systems to call it for making inferences. …


Tutorial on Applying Functions to Pandas DataFrames

Image for post
Image for post
Photo by picjumbo_com on Pixabay

Feature Engineering is an important step in the Data Science workflow. It is the process of extracting features from raw data using data mining techniques and domain knowledge. This can involve performing transformations or univariate, binary, and multivariate statistical analysis on existing data. These derived values can make data more intuitive for analysts and their algorithms. An experienced practitioner can quickly assess the problem and brainstorm new features to create based on existing data. These features, ranging in complexity, may require calculations that are easily done (and more readable) using Lambda functions. …


Guided Example to Set Up an API using Python and Flask to Make Data Accessible to Users

Note: The code for this post can be found here

Image for post
Image for post
Photo by geralt on Pixabay

API stands for Application Programming Interface. It is a software intermediary that allows systems to communicate with each other. Most businesses online have likely built APIs for customers and/or for internal use. For example, when a user enters a URL into their browser, e.g. www.medium.com, they are making a request to Medium’s server. Medium will then give back a response to be interpreted and displayed on the user’s browser. Modern client-to-server communications are mostly handled by APIs. The type and response will be dependent on a set of dedicated URLs (endpoints) and the request type. …


10 Guided Examples to Help You Understand Regex

Image for post
Image for post
Photo by Edward Howell on Unsplash

Regular Expressions (Regex) is an essential tool for text analytics. It is powerful in searching and manipulating text strings. Compared to the traditional approach for processing strings with a combination of loops and conditionals, one line of regex can replace many lines of code. Some well known use cases for such text processing include:

  • Input Validation such as password, email, name, etc.
  • Transformation such as splitting, replacements, etc.
  • Data Mining via web scraping, log searches, etc.

In this article, we’ll go through the basics on how to use Regex. The focus is primarily on defining the correct patterns for the search. …


Making the Most out of Decision Trees

<You can find the code used for demonstrate here>

Tree-based classification models are a type of supervised machine learning algorithm that uses a series of conditional statements to partition training data into subsets. Each successive split adds some complexity to the model, which can be used to make predictions. The end result model can be visualized as a roadmap of logical tests that describes the data set. Decision trees are popular for small-to-medium-sized data sets because they are easy to implement and even easier to interpret. However, they are not without challenges. …


A Study on How GPUs Accelerate the Data Science Workflow

<Code for this article can be found here>

Image for post
Image for post
Photo by Pixabay on pexels

Are GPUs faster than CPUs? It’s a very loaded question, but the short answer is no, not always. In fact, for most general purpose computing, a CPU performs much better than a GPU. That’s because CPUs are designed with fewer processor cores that have higher clock speeds than the ones found on GPUs, allowing them to complete series of tasks very quickly. GPUs, on the other hand, have much greater number of cores and are designed for a different purpose. At inception, GPU was originally designed to accelerate the performance of graphics rendering. It did so by allowing the CPU to offload burdensome calculations and free up processing power. GPUs render images more quickly than a CPU because of its parallel processing architecture, which allows it to perform multiple calculations across streams of data simultaneously. The CPU is the brain of the operation, responsible for giving instructions to the rest of the system, including the GPU(s). Today, with the help of additional software, GPUs’ capabilities have expanded to significantly reduce the time it takes to complete certain types of computations required at the different stages of Data Science. …


How Social Media Companies Use Machine Learning to Fight Fake News

Image for post
Image for post
Photo by TheDigitalArtist on pixabay

With more people now than ever relying on social media to stay updated on current events, there is an ethical responsibility for hosting companies to defend against false information. Disinformation, which is a type of misinformation that is intended to manipulate and mislead, can create unrest and panic. Other types of misinformation such as rumors and hoaxes, if left unchecked, also has the potential to bring mental and physical harm to unwary readers. The key to stopping the spread of misinformation is taking swift action against them since they have the tendency to travel very quickly. In fact, studies show that falsehood spreads exponentially faster than the truth (source). Social media companies have put in place protocols to limit the virality of inaccurate content, but they only take effect once the content has been reviewed by third-party fact-checking partners. Therefore, the focus is on rapid assessment of veracity. We’ve seen remarkable ingenuity from technology companies in this capacity. Namely, the use of Machine Learning algorithms to complement fact-checking programs for identifying inaccurate content. However, this is yet to be a complete solution. …


What are the Most Frequently Discussed Topics in the News by Category?

<Complete code for this demonstration can be found here>

This is a continuation of the Fun with NLP series — Natural Language Processing is a fast growing field in Machine Learning that makes it possible for computers to read, hear, speak, interpret, generate human language. In the Fun with NLP series — I demonstrate simple tricks that I used to analyze text data. For demonstration purposes, I’m using data from the NYT Archive API to explore the question:

What are the most frequently discussed topics in the news by category?


And No, It’s Not Just Machine Learning and Artificial Intelligence

With all the buzz around the topic, many people are considering a career in Data Science. But what is the essential skill set for this field? Unfortunately, there isn’t an obvious answer because the knowledge Data Scientists possess is spread across multiple traditional disciplines. In this article, I address this question the best way I know how — by using Data Science. My goal is to discuss the technical and non-technical skills that are critical for success in Data Science. This material was also shared at the panel discussion I spoke at.

Image for post
Image for post
Word cloud of prominent terms in Data Science jobs posting descriptions

I’ve posted the code for this analysis here — it can be used for any job title. …

Kevin C Lee

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store