What is machine learning?

What is Machine Learning?

David Cruwys Editorial

What is machine learning (ML)?

Machine learning is the semi-automated extraction of knowledge from data.

  • Knowledge from data starts with a question that might be answerable using data.
  • Automated extraction via data mining tools provides useful data sets to drive machine learning
  • ML is semi-automated meaning smart decisions by a human are needed at various points in the data pipeline


Find the original article here: Intro to machine learning with scikit-learn


Supervised learning is the first category of ML

Making predictions using data

Example: When given a set of emails, can we predict if an if email is “SPAM” or “Content”

There is a specific outcome we are trying to predict

Unsupervised learning is the second category of ML

This is about extracting structure out of data

Example: Segment grocery store shoppers into clusters that exhibit similar behaviors, there is no right answer.

At JobGetter.com we using both techniques to segment Job Postings into clusters based on different Roles such as Barista, Waiter or Chefs


How does machine learning “work”

  1. First, train a machine learning model using labeled with the outcome
    • “Machine learning model” learns the relationship between the attributes of the data and it’s outcome
    • At JobGetter.com we have a human categorized list of 100,000 plus roles and skills that allow us to easily train our machine models
  2. Second, make predictions on new data for which the label is unknown
    • The primary goal of supervised learning is to build a model that ‘generalizes”, it accurately predicts the future rather than the past.
    • We utilize our past manual categorization of roles, skills (hard & soft) and qualifications to predicate new values as Job postings hit our system.

Watch the full video below or read the Intro to Machine Learning article to hear more about:

  • How do I choose which attributes of my data to include in the model?
  • How do I choose which model to use?
  • How do I optimize this model for best performance?
  • How do I ensure that I’m building a model that will generalize to unseen data?
  • Can I estimate how well my model is likely to perform on unseen data?



David CruwysWhat is Machine Learning?