Northeast PHP 2013

Escalando PHP hasta 40 millones de usuarios únicos

Jonathan Klein  · 

Presentación

Vídeo

Transcripción

Extracto de la transcripción automática del vídeo realizada por YouTube.

are we all set with the camera as well all right I think we're good so let's get going and that's the last talk of the day I'm gonna try to kick you guys awake so this talk was titled scaling PHP 240 million uniques that's actually a line

so when i created this talk it was correct unfortunately that's no longer true it's now 60 million uniques that i'd see service per month so it's gonna be a little bit more interesting hopefully all right so who am i we already got this this

is all recap I work at Etsy I organize the Boston web perfect meetup group I was at wayfair and I let it seemed to convert stuff to PHP pretty simple if you want to see all the slides links from the presentation they're available right now at the short

URL don't skip too far ahead so some surprises but you can check that out right now and not have to take pictures of the slides are like that the building stir all day right now I'm gonna reference code is craft throughout the talk it's a blog

that etsy runs it's all written by engineers at Etsy just good to know about I'm going to talk about it a lot so just you're aware that's just our etsy engineering blog before we get into this some quick stats about etsy we get around one and

a half billion page views per month we did about a billion dollars in sales last year we have over a million lines of PHP and as I mentioned we got about 60 million uniquely unique monthly visitors this is a graph of traffic in 2012 and the 2013 traffic is

following a similar pattern so this is us here in August and you can see up here this is this is about fifty percent higher so no pulled the y axis off this graph but in absolute terms the peak of our holiday season is fifty percent higher than a traffic in

August so what this means is that this holiday season it's not going to be 60 million uniques it's going to be 90 million uniques and it's not going to be one and a half billion page views it's going to be two and a quarter billion page views

so that's a lot of cats and santa costumes something we think about all the time is not just cats but scaling for our users scaling for our sellers and buyers as well hobby system approaches which it is right now one of the interesting things about etsy

is that our architecture is pretty simple this may look kind of complicated but it just goes there's a lot of lines connecting different boxes it's actually your standard three-tier web application so you've got a request coming at the top load

balancer and you've got some web servers memcache boxes and my sequel you can see the numbers on the right there you've got less than 100 web servers tens of em cash boxes tens of my sequel shards those are your master master pairs we have a few others

as well so that doesn't cover the entire stack we also have some search boxes running on solar live gear man talk more about that in a bit we do have some Redis as well and we have postgres so legacy database were trying to get rid of at the point at the

at the moment let's start off by talking about the web tier this is the thing that probably most people in this room care about so we're running apache and PHP i'll be the most relevant and we spend the most time talking about this part of this

day first of all the hardware we use is mostly supermicro each box has 28 core intel processors in it 24 gigs of ram per machine and a 160 gig SSD so some of you might be thinking why do you put an SSD in a web server web servers don't tend to be io bound

they usually just call a code and ram anybody have any guesses for why you might have solid state drives new web servers yeah okay so the answer was when we're swapping that's actually not the main reason anybody else have ya so Surrey statically cache

files those are served on different boxes actually so it's not those no I reliability one more try PHP includes require I oh no so those are actually all going to be cached noppe kids so one more heat there we go all right heat and power so we paid for

power and redid a center and we pay for cooling in the datacenter SSDs consume less heat and the generate less or consume less power engineering less heat so Ashley over the life of a drive a small SSD isn't that much more expensive than a spinning drive

but you save money on on both heat while cooling and end on power as well and on top of that we do things like logging to the drives before the laws get shipped off to a central server so it helps logging as well but the primary reason is to save costs in

the data center all right on these boxes were running Apache just 22 we're sort of like chillin stable or swell is you pre Fork is there as a process manager and let me add reason for this is because pre Forks the best it's sort of isolating requests

from each other I want to make sure if one request is having issues doesn't influence the rest of the processes it just kind of does its thing and then goes along its way and the other quests are fine we just use mod PHP nothing too fancy and then here

the sort of critical numbers for our patchy setup I don't want to focus too much on these numbers the most important one to call out is this max request per child set to zero this means that the passion children don't die automatically they live essentially

forever but we do restart apache and all the web servers every night as part of our load as part of our log shipping process and super all the servers at night so the maximum time an Apache child is going to live would be 24 hours between the log rotate what

about PHP this is a PHP conference running PHP 54 as I mentioned using zend up cash anybody know what this is I heard of it maybe five or ten people who's heard of a pc all right that's like most of the room so Zen app cache was recently open source

by zend it's essentially replacement for a PC but whereas a PC has a user cash and an opcode cash zend doesn't know user cash there's no key value store in it it's just an opcode cash so basically what that means is it takes the PHP code and

it's compiled form as op codes and then puts it memory that's what we have that three gigabyte memory segment the reason for this is we have different translations or templates have different frames different languages in them we have to cash all this

in memory we don't want to hit disk to get any code so all the memory is cached in memory in zend all cash freeze process we set a memory limit of 128 Meg's and the max execution time of 30 seconds so let's talk about optimizing PHP who is here

for my talk last year called high performance PHP about a dozen or so so a quick recap of that there's a link to the presentation at the bottom of the slide and also at jkl ein / any PHP but quick recap is use an OP code cache which we just talked about

use things like XH pro-4x to bug to profile your code use stats be in graphite to monitor it find your hot spots and then final the upgrade PHP I don't want to rehash all that stuff today we're gonna talk about some new things send me optimizations

for PHP if you want to hear about this stuff come find me at the party or go look at last year's talk so some new things that we've done it at see recently that have been big wins for us are 41 creating static arrays so this sounds kind of crazy or

like what do you mean well here's an example all of our translations Etsy's now served and I believe it's nine different languages all our translations go into static PHP files so there's a hash associative array there's a hash there's

some content the actual translated string this case it's German and then essentially a location of where that that strings can be replaced so we have these tons of translated files are automatically generated from a database these files get deployed with

the site every time you deploy the site it's a separate deploy process we don't want to deploy and generate the translation files every single deploy so it has to be a manual step 2 click yes I want translations deployed but the benefit of this is

these translation files get put into the OP cache memory segment and they're extremely fast PHP is extremely good at accessing PHP arrays in memory it's way faster than em cash way faster than an APC key-value store may be on par and then obviously

fashion going to a database so this is the most performance scalable way we found to handle high frequency data in our stack and it's not just translations we do this for all sorts of data if we have a piece maybe it's a small array maybe it's

only 100 elements which wondered elements update it only changes a few times a day that's something we'll put into a static array and then deploy with the site because we're deploying the site 30 to 50 times a day it's not a problem to make

a deployment to make a static content of static array change so this is kind of thing where previously you might have put in none cash but by having these is really heavily accessed values memcache use up a ton of bandwidth in your data center and you just

use a memory and I'm cash you don't need to use so this has been a big win for us another thing we use a lot is your man who's heard of your man maybe 15 20 people so give us a job server at a very basic high level view such what we do is anything

that is going to take a long time we found that out to gear man and then return the process back to PHP as fast as possible so we want to make these things asynchronous this might be things like resizing images it might be batch processing some listing data

for a seller but these things get shipped off to gear man gear man trims away we have a whole separate pool of service for this work and then PHP can return as quickly as possible people can get back to doing what's important like buying santa costumes

for their dogs as I mentioned we deploy pretty often one thing we rolled out recently was this atomic deploy strategy now atomic deploy simply means at the moment of deployment you switch from one version of the code base to a new version of the code base

in a single instant there's no mixed files you don't get file a from the previous code deploy plus file be from the latest codepoint get mismatch in files that way if you do have mismatches at a high frequency site like ours you're going to wind

up calling functions that don't exist yet things like that really bad so you want to have your boys be as atomic as possible and the way we do this is with a few different techniques so we wrote a new Apache module called mod real doc mobbed real doc has

a pretty simple roll it basically just gives very very early in the Apache process it gives you the absolute path to the code on your server so this allows it to know exactly where the code is actually getting from and that's going to come in handy in

a minute we also wrote a PHP extension in house called ink path these are both on github by the way and this essentially lets you establish at a higher level in the extension what the include path is for all of your PHP includes so this means if you change

[ ... ]

Nota: se han omitido las otras 5.615 palabras de la transcripción completa para cumplir con las normas de «uso razonable» de YouTube.