Machine Learning is a topic that I’ve been interested in for years, but have never taken the time to learn. I’ve recently found some articles and projects that looked interesting, which I link to later. But before I really jump into any of those I thought I should start at the beginning with an introduction to machine learning.

The Machine Learning course taught by Andrew Ng seems to be the seminal course. I also found Intro to Machine Learning available on Udacity is taught by Sebastian Thrun and Katie Malone.

I decided to try out the Udacity course since I’ve never really tried their courses before.

Here are links to articles, blog posts and projects that look interesting:

My plan is to update this page from time to time with notes and my status on my journey to grokking Machine Learning.


  • March 01, 2017 - start Intro to Machine Learning
  • March 03, 2017 - Completed Naive Bayes section of course
    • machine learning finds the decision surface (aka decision boundary), separates one class from another to generalize new data points
    • algorithms are divided into two types: supervised vs unsupervised
    • Bayes Algorithm
    • Prior Probability x Test Evidence -> Posterior Probability
    • Naive Bayes algorithm
      • supervised
      • usually used for text learning
      • looks at word frequencies, not word order
      • given the frequency of words for a person, multiply the probability for each word and then multiply that by the prior probability. Do that for each person. For example, If you have two people, perform the multiplication for Person A and Person B, this will give you the ratio of whether it was Person A or Person B.
      • It is good for classifying texts because of its simplicity and the independent features.
      • Example where it fails, “Chicago Bulls”. Since the algorithm ignores word order it would treat it as “Chicago” and “Bulls”.
  • March 07, 2007

    • Support Vector Machine (SVM)
    • find separating line between data of two different classes, called a hyper line
    • the “best” line maximizes the distance to the nearest point, this is called the margin
    • should “do the best it can”, when a clear line can’t be created (b/c of outliers)
    • sometimes you have to add a new feature (such as x^2+y^2, or absolute value of x |x|, etc) so the SVM can linearly separate the two classes of data
    • kernel trick
    • there are often parameters you can use when setting up your classifier (can make a big difference when training the model)
      • kernel
      • C – controls tradeoff between smooth decision boundary and classifying training points correctly
      • gamma
    • work well in complicated domains where there is a clear marginal separation
    • don’t perform well in very large datasets b/c the training time, or lots and lots of noise (where classes overlap a lot, naive bayes would be better)
    • Want to prevent overfitting, by tuning parameters. That’s where the decision boundary is very complicated.
  • March 14, 2017

    • Next up, Decision Trees