Limitando el tiempo de ejecución mediante la programación basada en interrupciones (PyCon Australia 2013)

Transcripción

Extracto de la transcripción automática del vídeo realizada por YouTube.

senior developer on the bit bucket team atlassian focusing mostly on the server side stack he has a special interest for performance elegant algorithms and operating systems to explain interrupt driven development to us please welcome Erik van zest thank you

alright so yeah I'm Eric I worked out less en work on bitbucket and for those who don't know a bit bucket is a free code hosting site for guinea mercurial private and public repos and today I want to talk about the importance of responsive web apps

not specifically web apps but I'm going to talk about the from that from the background in context of my work which still web apps but the so the examples i'm going to use will be from bitbucket but um like the techniques are going to talk about apply

equally well to tune on web apps I want to start off with this tweet which showed up on my timeline recently which seems to suggest that we are significantly slower than the competition which took me by surprise in this is a tweet so obviously there's

no contacts there's no detail there's nothing at all to substantiate what this this appears to claim but at the same time I'm sure that Stephen wasn't making this up and he was actually measuring something so I quickly went over to our our

performance dashboard you can see that the top there it's that's that's bitbucket and for that entire week I looked at our average response times yeah as you can see for that week or every response time was 123 milliseconds on the server then I

opted over to the to github looked at their publicly published numbers and for that same week they claimed 139 milliseconds and so you know that's nearest makes no difference equal so there's nothing in here either to suggest that you know we are so

much slower so had a closer look and I discovered it Stephen works for a company that writes a product that integrates with both these services in on our end they rely on a small number of AP is one of which admittedly is indeed very slow now luckily the that

API endpoint doesn't get a lot of traffic and maybe that's that's that's not actually not so lucky but as a result the the poor performance doesn't really have much of an effect on our average response time so what he was seeing was real

but it wasn't really representative for our service but unfortunately it is those like that small number of like that small portion of really slow requests that make in some ways have the biggest effect on the usability of your site you know your average

may say that you know you're doing well but um you know let's it's it's a slow request that get people in trouble like those are the ones that break pages those are the ones that break integrations with external applications and public evidently

get people to rent about it on Twitter so it goes to show that average numbers when it comes to response times are a really misleading it's a it's not the fast request that people complain about but for every couple of fast requests you've got

a really slow request those are the ones that break things so it's a it gives you a full sense of accomplishment I guess looking at those average numbers it's a little bit like you know looking at the rainbow and saying well the average color of the

rainbow is is whitish which may be true but it's a pretty pointless way of looking at a rainbow in it's similar with with slow requests they have the biggest effect on your site so you should be looking at those forget the average look at 98 percentile

maybe even 99 percentile in fact it's even worse like some requests literally run forever like if they're stuck in a infinite loop or in a deadlock then they're never going to break out of that where maybe they're doing is to bend this number

of redundant database queries which by the way is a very common problem on database driven applications every time you hit one of those they take out one of your threads on your web server or your worker processes on your web server in after a while you will

have nothing left to serve as other requests and you've basically you know you got a dose on your site but to prevent this scenario of do many web servers or some web service has a built-in failsafe like a watchdog that monitor long runner requests whenever

they exceed in certain time out they kill those requests to prevent you know dose now we run Gunny corn get the Gunny corn sink worker which comes with it like a default 30-second timeout after 30 seconds if a request hasn't returned yet it heart kills

that worker process in it does it by sending a kill signal which is equivalent to type and killed at nine it's a signal that you can't catch your ignore in your application can even detect it and so it always works which is very good of course if you

use it for a fail-safe but it has a consequence that the program being killed has no no no ability to log any useful information in so in our case you know we don't even know these things happen because we don't log the the time it takes to run these

things we don't know what they were stuck on so we have very little to go on to fix these things but they are the most important queries or requests on our site so we needed a way to make these things visible the very least so we know that they occur we

know what they were doing what they're stuck on and then we can do something about him so we did was we we experiment with a an alarm signal handler we did was a at the start of every request we started an alarm signal for 28 seconds and then on the on

[ ... ]

Nota: se han omitido las otras 2.771 palabras de la transcripción completa para cumplir con las normas de «uso razonable» de YouTube.