Barbarian Meets Coding

WebDev, UX & a Pinch of Fantasy

4 minutes readmachine-learningdraft

Machine Learning

A robot bear learning the mysteries of life
Machine learning is a field of study within Artificial Intelligence devoted to understanding and developing programs that learn to perform tasks using data.

What is machine learning?

Machine Learning is a field within Artificial Intelligence that focuses on understanding and building methods that learn and can use that learning to perform tasks (like been able to label images, recommend your favorite songs, write an article, draw a piece of art or compose some music).

By learn we mean that a machine learning program can process lots of data and build a model of understanding from that data so that we can then use that knowledge to perform useful tasks. This concept is easier to grasp with an example and in contrast to traditional programming.

In traditional programming (or ruled based programming) we write a program that encodes a set of rules to perform a task, for instance, we might write a program for an activity tracker that based on your speed records whether you’re walking or jogging:

function labelActivity(speed:number) {
  if (speed < 5) return 'walking'
  if (speed < 10) return 'jogging'
  if (speed < 20) return 'running'
  // etc...

In this context provided some input (speed) and some business logic (written by ourselves as programming rules) we get some desired output (whether the person is walking, running or cycling).

In traditional programming, given an input and rules we get a result:

input ------>     |   rules  |   ------> output ???

In machine learning, instead of us providing the rules, we provide the machine learning algorithm with lots of data that it can use to derive the rules of the system itself. Once it learns these rules, in the form of a Machine Learning model, it can make predictions about new data. In the case of a motion tracker one could provide the machine learning algorithm with a training set of sensor data labelled as belonging to different activities (what we call supervised learning).

In machine learning, given lots of inputs and outputs (a training set) we
derive the rules of the system (as an ML model) which we can then apply to
novel inputs.

Phase 1) Training: learn the rules of the system
training set input ------>     |   rules???  |   ------> training set output

Phase 2) Inference: apply the learned rules to make decisions

novel input ------>     |   learned rules  |   ------> output ???

Given the above, Machine learning is specially useful at solving problems that are too complex to reduce to a set of rules that could be programmed by a human.

Types of machine learning systems

  • Supervised learning: Supervised learning consists in training models with labelled data so that these model can infer the rules of a system and apply those rules to make predictions for new data. The two most common use cases for supervised learning are regression and classification.
    • Regression models predict numeric data like a weather model that predicts the amount of rain.
    • Classification models predict the likelihood of something belonging to a category (of a known set of categories). If there are two categories we call the model binary (e.g. rain or no rain), otherwise we have a multi-class model (rain, hail, snow, etc).
  • Unsupervised learning: Unsupervised learning consists on training models with unlabelled data in a manner that allows models to identify meaningful patterns in the data that can be used to make predictions. A common unsupervised learning technique is clustering, which groups similar data into natural groupings (or clusters).
  • Reinforcement learning: Models that use reinforcement learning are trained by getting rewards or penalties based on actions performed within an environment. Through this process of learning, the model derives a policy to maximize the rewards and minimize the penalties. Reinforcement learning is used to train robots to perform tasks like walking inside a room or programs like AlphaGo to learn Go.

Supervised learning

Supervised learning tasks are well-defined which makes it a great machine learning technique with many practical applications like identifying spam or predicting the weather.

The foundation of any supervised learning model is data. Data can come in many shapes and forms from text, to numbers, tables, pixels, audio or video. We typically store related data in datasets: images of cats, dogs, articles of clothing, weather information, domain pricing, etc.

In Machine Learning parlance, dataset are made up of individual examples that contain features and labels:

  • An example is a single sample or singular entity in dataset
  • A feature is a characteristic or value of that example the model will use to predict a label
  • A label is the result what we want the model to predict

A dataset itself is characterized by its size and diversity. Size indicates how big the dataset is, how many examples it has. Diversity indicates the range those examples cover in relation to the entire problem space. A good training dataset is both large (has may examples) and diverse (of may different types) in a way that it becomes representative of the problem space as a whole.

// more


Jaime González García

Written by Jaime González García , dad, husband, software engineer, ux designer, amateur pixel artist, tinkerer and master of the arcane arts. You can also find him on Twitter jabbering about random stuff.Jaime González García