PHP UK Conference 2013

Diseñando un panel de control para monitorización de grandes aplicaciones

Lorenzo Alberton  · 

Presentación

Vídeo

Transcripción

Extracto de la transcripción automática del vídeo realizada por YouTube.

so I don't know if like you agree with me but I think this has been a really good conference so far we learnt new tricks we learn learn to how to design better software how to make it more resilient and now we are gonna see what happens next that is how

to keep the system running and usually that is done just by observing how the system behaves and making sure that nothing is missing from our outlook so a very quick word about myself I work a data sift where we ingest process augment and filter about 10,000

messages messages per second we get between five holes in real time and many resources if you are interested in knowing about what we do that's linked there explains our architecture it's a bit old but still relevant in the past few years I've

been really lucky to be on several large-scale projects and from those projects and from people I work with I learned a lot about what means to keep those complex systems up and I certainly know and I now understand the importance of monitoring and good reporting

so we live in an age where everything is big everything is complex and to handle the complexity we have a lot of parts of that form our overall platform but this means that anytime time can be really a big problem for the business and if we we start seeing

problems like we don't have any output from our system then it's difficult to find out where the actual problem is in the architecture that can be a thousand things to it that can go wrong so it's very important to understand obviously also observability

is a critical feature so you have to make money hooks into your application to collect all the metrics you can and of course collecting metrics is just one part of it partly the over a problem because if you don't look at them they are pretty useless and

the good reporting is usually really really difficult and especially when you are able to handle millions and millions of events per second spotting the important ones is really hard so what I came to realize is that the dashboards and the BI tools in the

past 20-30 years have not done a very good job at helping us at this task they are they probably missed the point entirely focusing on all sort of wrong problems and they fail at communicating which is the most important thing the most important feature we

want from a good - boredom monitoring dashboard so the I I think that monitoring dashboards are too important to be left in the hands of pie chart lovers so let's see a few things that we can do to make dashboards better and I would do it by taking a step

back and understanding who is gonna use dashboards and that's humans it's not machines like I was saying in his keynote yesterday we need to think about humans first and to the so we need to understand a little bit about how we think and how I make

my decision in decisions that's what we usually call cognitive science this whole talk is heavily inspired by this book I read last year it's about statistics economics and how we think and how make decisions and I thought that many other things in

these wonderful books apply to system monitoring as well so this book is a tale of two systems and those two systems are the modes that our brain operates on there is a one mode which is fast and apparently doesn't require an afford like a strain 2+2 we

we know it's for we don't need to think about the operation itself and we can call these system one many ways intuition instinct gut feeling and then we have a slower thinking mode so that's something that requires a lot of attention there is a

thoughtful like computing the other operation it's not impossible but we need to recollect the only steps on how to solve it then we need to keep in mind all the intermediate results and finally we we can after a lot of athwart answer the question and

most of the times we don't give in here so we we don't bother the the reason why the distinction is very important is that we have only a finite capacity for the second during the day we can only allocate so much attention to tasks so it's very

very important that we preserve the tension as much as you can for when we need it and in normal mode we should really rely on our intuition our instinct as much as possible mind you D system to the slow thinking is always there to double-check to regulate

what the intuition is is coming up with but is a bit as I said is a limited and can only be used so not so much so the first system system one fast thinking is there primarily for one function it's there to model our what we consider in normality is there

to qualify the environment around us under normal conditions because once we know what what is normal then we can be surprised we can be surprised by what is not normal what can be threat to our environment and so that's when we switch on our attention

and make it out really do their hard work because if that's when we really need to explain our energy so how does this relate monitoring dashboards well in my view my team dashboard should be there in the background giving after a very quick look even

as a warm feeling that everything is normal and then we have alerts that can trigger something else they can create a surprise so we can then switch on our attention and focus on the problem that was recognized as body system and we can then expand the energy

actually debugging the problem we this is a picture a picture of all our all the monitoring world and as you can see we also have a siren or weather connected to a Xena's alert whenever there is something wrong it sounds so it's a very effective way

of attracting a correlation so there are many ways we can do use to to improve the way we have tracked our attention by not wasting our CPU cycle cycles and only color the attentional important stuff in color and over imagery definitely make it harder to spot

important information so take this graph and take this one now it's immediate that these is what we should focus on this is we only highlight things that are problematic we shouldn't really care about and analyze what what's in the board I should

the dashboard should tell us immediately it should really call our ability to process information visually the other mistake we often make is we add a lot of data onto our dashboards and we often require a read long descriptions just to make sense of what's

in the graph this is a beautiful infographics very good almost 150 years ago and it represents the march to Napoleon's army on to Moscow and I believe it's a masterpiece of information density it shows the path of the army along town by town after

town up to Moscow it shows the size of the army as it which is Moscow and the black line is decided the army that comes back home from from Moscow and you can also see here the temperature that happened to be at a time while whilst the army was in in each

[ ... ]

Nota: se han omitido las otras 3.430 palabras de la transcripción completa para cumplir con las normas de «uso razonable» de YouTube.