Lenny Distilled

Ronny Kohavi

Consultant, Former VP at Airbnb, Microsoft, Amazon

11 quotes across 1 episode

The ultimate guide to A/B testing

You have to allocate sometimes to these high risk, high reward ideas. We're going to try something that's most likely to fail. But if it does win, it's going to be a home run. And you have to be ready to understand and agree that most will fail.

A surprising experiment is one where the estimated result beforehand and the actual result differ by a lot. So that absolute value of the difference is large.

Twyman's law, the general statement is if any figure that looks interesting or different is usually wrong. If the result looks too good to be true, your normal movement of an experiment is under 1% and you suddenly have a 10% movement, hold the celebratory dinner.

Many people assign one minus P value as the probability that your treatment is better than control. That is wrong.

At Airbnb, where the success rate is only 8%, if you get a statistically significant result with a P value less than 0.05, there is a 26% chance that this is a false positive result. It's not 5%, it's 26%.

Unless you have at least tens of thousands of users, the math, the statistics just don't work out for most of the metrics that you're interested in. Start experimenting when you're in the tens of thousands of users. Below that, start building the culture, start building the platform, start integrating.

To me, the key word is lifetime value, which is you have to define the OEC such that it is causally predictive of the lifetime value of the user.

At Microsoft, about 66%, two thirds of ideas fail. At Bing, which is a much more optimized domain after we've been optimizing it for a while, the failure rate was around 85%. And then at Airbnb, this 92% number is the highest failure rate that I've observed.

I'm very clear that I'm a big fan of test everything, which is any code change that you make, any feature that you introduce has to be in some experiment. Because again, I've observed this sort of surprising result that even small bug fixes, even small changes can sometimes have surprising, unexpected impact.

Find a place, find a team where experimentation is easy to run. Don't go with the team that launches every six months, or Office used to launch every three years. Go with the team that launches frequently.

When you think about return on investment, we could get the data by having some engineers spend a couple of hours implementing it. And that's exactly what happened. Somebody at Bing who kept seeing this in the backlog and said, 'My God, we're spending too much time discussing it. I could just implement it.'