Demystifying AI for the intelligently curious

Item Response Theory: How Smart ARE You?

May 15, 2016

Psychometrics is all about measuring the psychological characteristics of people; for example, scholastic aptitude. How is this done? Tests, of course! But there's a chicken-and-egg problem here: you need to know both how hard a test is, and how smart the test-taker is, in order to get the results you want. How to solve this problem, one equation with two unknowns? Item response theory--the data science behind such tests and the GRE.

Relevant links:
https://en.wikipedia.org/wiki/Item_response_theory

Go!

May 15, 2016

As you may have heard, a computer beat a world-class human player in Go last week. As recently as a year ago the prediction was that it would take a decade to get to this point, yet here we are, in 2016. We'll talk about the history and strategy of game-playing computer programs, and what makes Google's AlphaGo so special.

Relevant link:
http://googleresearch.blogspot.com/2016/01/alphago-mastering-ancient-game-of-go.html

Great Social Networks in History

May 15, 2016

The Medici were one of the great ruling families of Europe during the Renaissance. How did they come to rule? Not power, or money, or armies, but through the strength of their social network. And speaking of great historical social networks, analysis of the network of letter-writing during the Enlightenment is helping humanities scholars track the dispersion of great ideas across the world during that time, from Voltaire to Benjamin Franklin and everyone in between.

Relevant links:
https://www2.bc.edu/~jonescq/mb851/Mar12/PadgettAnsell_AJS_1993.pdf
http://republicofletters.stanford.edu/index.html

How Much to Pay a Spy

May 15, 2016

A few small encores on auction theory, and then--how can you value a piece of information before you know what it is? Decision theory has some pointers. Some highly relevant information if you are trying to figure out how much to pay a spy.

Relevant links:
https://tuecontheoryofnetworks.wordpress.com/2013/02/25/the-origin-of-the-dutch-auction/
http://www.nowozin.net/sebastian/blog/the-fair-price-to-pay-a-spy-an-introduction-to-the-value-of-information.html

Sold! Auctions Part 2

May 15, 2016

The Google ads auction is a special kind of auction, one you might not know as well as the famous English auction (which we talked about in the last episode). But if it's what Google uses to sell billions of dollars of ad space in real time, you know it must be pretty cool.

Relevant links:
https://en.wikipedia.org/wiki/English_auction
http://people.ischool.berkeley.edu/~hal/Papers/2006/position.pdf
http://www.benedelman.org/publications/gsp-060801.pdf

Going Once, Going Twice: Auctions Part 1

May 15, 2016

The Google AdWords algorithm is (famously) an auction system for allocating a massive amount of online ad space in real time--with that fascinating use case in mind, this episode is part one in a two-part series all about auctions. We dive into the theory of auctions, and what makes a "good" auction.

Relevant links:
https://en.wikipedia.org/wiki/English_auction
http://people.ischool.berkeley.edu/~hal/Papers/2006/position.pdf
http://www.benedelman.org/publications/gsp-060801.pdf

Chernoff Faces and Minard Maps

May 15, 2016

A data visualization extravaganza in this episode, as we discuss Chernoff faces (you: "faces? huh?" us: "oh just you wait") and the greatest data visualization of all time, or at least the Napoleonic era.

Relevant links:
http://lya.fciencias.unam.mx/rfuentes/faces-chernoff.pdf
https://en.wikipedia.org/wiki/Charles_Joseph_Minard

t-SNE: Reduce Your Dimensions, Keep Your Clusters

May 15, 2016

Ever tried to visualize a cluster of data points in 40 dimensions? Or even 4, for that matter? We prefer to stick to 2, or maybe 3 if we're feeling well-caffeinated. The t-SNE algorithm is one of the best tools on the market for doing dimensionality reduction when you have clustering in mind.

Relevant links:
https://www.youtube.com/watch?v=RJVL80Gg3lA

The [Expletive Deleted] Problem

May 15, 2016

The town of [expletive deleted], England, is responsible for the clbuttic [expletive deleted] problem. This week on Linear Digressions: we try really hard not to swear too much.

Unlabeled Supervised Learning--whaaa?

May 15, 2016

In order to do supervised learning, you need a labeled training dataset. Or do you...?

Relevant links:
http://www.cs.columbia.edu/~dplewis/candidacy/goldman00enhancing.pdf

Hacking Neural Nets

May 15, 2016

Machine learning: it can be fooled, just like you or me. Here's one of our favorite examples, a study into hacking neural networks.

Relevant links:
http://arxiv.org/pdf/1412.1897v4.pdf

Zipf's Law

May 15, 2016

Zipf's law is related to the statistics of how word usage is distributed. As it turns out, this is also strikingly reminiscent of how income is distributed, and populations of cities, and bug reports in software, as well as tons of other phenomena that we all interact with every day.

Relevant links:
http://economix.blogs.nytimes.com/2010/04/20/a-tale-of-many-cities/
http://arxiv.org/pdf/cond-mat/0412004.pdf
https://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/

Indie Announcement

May 15, 2016

We've gone indie! Which shouldn't change anything about the podcast that you know and love, but we're super excited to keep bringing you Linear Digressions as a fully independent podcast.

Some links mentioned in the show:
https://twitter.com/lindigressions
https://twitter.com/benjaffe
https://twitter.com/multiarmbandit
https://soundcloud.com/linear-digressions
http://lineardigressions.com/

The Cocktail Party Problem

May 15, 2016

Grab a cocktail, put on your favorite karaoke track, and let’s talk some more about disentangling audio data!

Links: Deep learning machine solves the cocktail party problem

The cocktail party effect

Portrait Beauty

May 15, 2016

It's Da Vinci meets Skynet: what makes a portrait beautiful, according to a machine learning algorithm. Snap a selfie and give us a listen.

Link: The beauty of capturing faces: rating the quality of digital portraits

A Criminally Short Introduction to Semi-Supervised Learning

May 15, 2016

Because there are more interesting problems than there are labeled datasets, semi-supervised learning provides a framework for getting feedback from the environment as a proxy for labels of what's "correct." Of all the machine learning methodologies, it might also be the closest to how humans usually learn--we go through the world, getting (noisy) feedback on the choices we make and learn from the outcomes of our actions.

Link: David Silver's Reinforcement Learning course

Thresholdout: Down with Overfitting

May 15, 2016

Overfitting to your training data can be avoided by evaluating your machine learning algorithm on a holdout test dataset, but what about overfitting to the test data? Turns out it can be done, easily, and you have to be very careful to avoid it. But an algorithm from the field of privacy research shows promise for keeping your test data safe from accidental overfitting.

Link: The Reusable Holdout: preserving validity in adaptive data analysis

The State of Data Science

May 08, 2016

How many data scientists are there, where do they live, where do they work, what kind of tools do they use, and how do they describe themselves? RJMetrics wanted to know the answers to these questions, so they decided to find out and share their analysis with the world. In this very special interview episode, we welcome Tristan Handy, VP of Marketing at RJMetrics, who will talk about "The State of Data Science Report."

Link: The State of Data Science

Data Science for Making the World a Better Place

May 08, 2016

There's a good chance that great data science is going on close to you, and that it's going toward making your city, state, country, and planet a better place. Not all the data science questions being tackled out there are about finding the sleekest new algorithm or billion-dollar company idea--there's a whole world of social data science that just wants to make the world a better place to live in.

Link:

Driven Data

Kalman Runners

May 08, 2016

The Kalman Filter is an algorithm for taking noisy measurements of dynamic systems and using them to get a better idea of the underlying dynamics than you could get from a simple extrapolation. If you've ever run a marathon, or been a nuclear missile, you probably know all about these challenges already. By the way, we neglected to mention in the episode: Katie's marathon time was 3:54:27!

Link:

How a Kalman filter works, in pictures