Ronny Kohavi

Consultant, Former VP at Airbnb, Microsoft, Amazon

11 quotes across 1 episode

The ultimate guide to A/B testing

You have to allocate sometimes to these high risk, high reward ideas. We're going to try something that's most likely to fail. But if it does win, it's going to be a home run. And you have to be ready to understand and agree that most will fail.

Watch at 00:00:22

A surprising experiment is one where the estimated result beforehand and the actual result differ by a lot. So that absolute value of the difference is large.

Watch at 00:17:32

Twyman's law, the general statement is if any figure that looks interesting or different is usually wrong. If the result looks too good to be true, your normal movement of an experiment is under 1% and you suddenly have a 10% movement, hold the celebratory dinner.

Watch at 01:00:51

A P value is a statistical measure used in A/B testing to determine if experimental results are statistically significant, commonly set at 0.05 (5%).

Many people assign one minus P value as the probability that your treatment is better than control. That is wrong.

Watch at 01:02:30

The "success rate" refers to the historical percentage of A/B tests at Airbnb that showed positive results.

At Airbnb, where the success rate is only 8%, if you get a statistically significant result with a P value less than 0.05, there is a 26% chance that this is a false positive result. It's not 5%, it's 26%.

Watch at 64:54

Unless you have at least tens of thousands of users, the math, the statistics just don't work out for most of the metrics that you're interested in. Start experimenting when you're in the tens of thousands of users. Below that, start building the culture, start building the platform, start integrating.

Watch at 00:26:42

OEC stands for "Overall Evaluation Criterion," Kohavi's framework for defining metrics that A/B tests should optimize for.

To me, the key word is lifetime value, which is you have to define the OEC such that it is causally predictive of the lifetime value of the user.

Watch at 00:32:05

Context: Kohavi is referring to A/B testing experiments at tech companies, where "failure" means the experiment didn't improve the target metric.

At Microsoft, about 66%, two thirds of ideas fail. At Bing, which is a much more optimized domain after we've been optimizing it for a while, the failure rate was around 85%. And then at Airbnb, this 92% number is the highest failure rate that I've observed.

Watch at 14:09

I'm very clear that I'm a big fan of test everything, which is any code change that you make, any feature that you introduce has to be in some experiment. Because again, I've observed this sort of surprising result that even small bug fixes, even small changes can sometimes have surprising, unexpected impact.

Watch at 00:00:00

Find a place, find a team where experimentation is easy to run. Don't go with the team that launches every six months, or Office used to launch every three years. Go with the team that launches frequently.

Watch at 01:08:34

When you think about return on investment, we could get the data by having some engineers spend a couple of hours implementing it. And that's exactly what happened. Somebody at Bing who kept seeing this in the backlog and said, 'My God, we're spending too much time discussing it. I could just implement it.'

Watch at 06:19

All Guests

Ronny Kohavi

The ultimate guide to A/B testing

Add to Home Screen

The Missing Stamp