PyCon Australia 2013

Utilizando Vaurien y Marteau para testear los errores en tus aplicaciones

Ryan Kelly  · 

Transcripción

Extracto de la transcripción automática del vídeo realizada por YouTube.

now we have Ryan Kelly the speaker is one of the certified evil geniuses of the Australian Australian Python community he's currently working on building high-performance web ad is for the Mozilla services team and he's going to talk to us about testing

for graceful failure please welcome Ryan Kelly cool place for coming out everyone yes I am here to talk about testing now it is 2013 so I hope I don't need to stand up here and try and tell you all about the virtues of testing your code right testing and

automated testing one of the best and possibly one of the only ways as professionals that we can build software that works reliably but the funny thing about testing is that no matter how much of it you do it's never quiet enough all right you can have

huge volumes of unit tests and a hundred percent branch coverage and a great chunk of integration and functional tests and you put your cold out there in the world where it has to interact with other things and stuff will still go wrong so I'm kind of

here to talk about how we can poke at the things that go wrong and figure out how our code will cope when they inevitably do so I'm going to assume that we're actually starting from a very happy place I assume you have a web application which is what

I do and you're actually confident that this thing works you know you have tests it does what it's supposed to do and you've also figured out how you're going to deploy this application into the world how you're going to check on it to

make sure it's still behaving as you intend now for a for a web project this is a very happy place to be like a lot of a lot of projects won't even get to this point yeah nonetheless if your application is really good and you make the front page of

reddit or hacker news or something or if you're being cool and deploying into the cloud and your cloud vendor has some sort of network outage your database goes down it starts dropping packets to you know to your medications and stuff like that you're

going to take it hit your application is going to start failing now the question is how gracefully can you cope with these kinds of failures if you're a really super important code maybe you want to be sure that your code can stay standing and keep operating

even in the face of failures in its environment if it's less important maybe you just want to check on that it can do as good a job as it can and maybe take a small hit rather than a big hit so I come to this field from the Mozilla services team where

I work with a lot of people who know a great deal more about this stuff than me we're responsible for running things like the servers behind Firefox Sync the marketplace app that's going to support the new firefox phone and making sure that these services

will stay standing in the event of failure so I'm going to give you a little bit of a demo of some of the processes and tooling that we've built up around doing this it's not a tutorial if you're interested in this stuff in more detail you'll

have to read the docs or I'll come and find me afterwards and we'll chat but I want to give you an idea of the sort of failures I'm interested in this is a very simplified diagram of the firefox sync server right it's a little wsgi application

it takes some data in from from Firefox and it dumps it into mysql because we have a fairly large user base it actually does some app level sharding right each user is assigned to a particular database and the web app is responsible for sending data to the

correct place if one of these databases starts having a problem or the network goes down and packets aren't getting through now according to this diagram what should happen is that users destined for this particular unhealthy database will get an error

well they might turn out or something users destined for these other databases they should be fine right there's no problem now we should be able to keep going even in the event of this partial failure unfortunately the real world is never as simple as

the diagrams would have you believe and what actually happens is that your web server has this little pool of worker processes inside it every time one of these worker processes goes to talk to the database that's having problems it doesn't come back

if you have high enough traffic going on all at once and these requests are taking long enough on the backend you start discovering that all of your workers are disappearing and eventually the whole system comes crashing down this actually happened to our

to our servers that's bad I wouldn't call that a graceful failure we haven't done the best that we could do under the circumstances the thing is though like as I say is actually stand up your system in a state that approximates what it's going

to look like in production you deploy it in a state that it kind of looks like in production put it under the kinds of stresses and loads that it's going to be experiencing in production and then try and break stuff make the environment around your application

fail let's see what happens see if you can figure out how it's going to handle those failures the rest is up to you so step one deploy it and this is going to depend heavily on you know your own setup you might be deploying into the cloud for Firefox

Sync we have some actual physical staging hardware that we deploy our stuff on for the purpose of this talk I've stood up a little demo server and i'll just show you real quick how it how it looks using requests because everybody should be using requests

and i'll put in my actual user name and password here so be kind don't hack me so I have you know pike on a unit services that mozilla com i can go ahead and have a look at the data that I've got synced there so I have a couple of sinks collections

I've got my history my bookmarks and so forth you know I can make requests to this system and get it all of my sync data that's our starting point we have a system standing up that works now what are we going to do let's stress it out our go-to

tool for this has been a tool called funk load which is basically a functional and load testing tool and has some nice facilities for writing your test it has some nice facilities for generating reports it's a little bit clunky to be honest which you might

[ ... ]

Nota: se han omitido las otras 3.111 palabras de la transcripción completa para cumplir con las normas de «uso razonable» de YouTube.