Making the Netflix API More Resilient

A new Netflix Tech Blog post by my manager (Ben Schmaus) discusses how we've been making the Netflix API more resilient through the use of circuit breakers, bounded thread-pools and realtime decision making:

Here are some of the key principles that informed our thinking as we set out to make the API more resilient. > > > 1. A failure in a service dependency should not break the user experience for members > > 2. The API should automatically take corrective action when one of its service dependencies fails > > 3. The API should be able to show us what’s happening right now, in addition to what was happening 15-30 minutes ago, yesterday, last week, etc. >

A video showing the realtime monitoring dashboard is on Vimeo:

Netflix API Circuit Dashboard from Ben Christensen on Vimeo.