PyCon Australia 2013

Gestión de simulaciones científicas con Python y Redis Queue

Andrew Walker  · 

Transcripción

Extracto de la transcripción automática del vídeo realizada por YouTube.

okay welcome to the our last session for today so our next presenter is currently employed as a research scientist specializing modeling and simulation of physical systems with robotics being his first love in 2011 he finished his dissertation on hard real-time

motion planning for autonomous vehicles please welcome Andrew Walker thanks very much everybody so my talk today is about managing scientific simulations with ret askew and so I have to kind of disclaimers at the start of my talk the first is that this is

most definitely a beginner talk if you've done any parallel computing or distributed computing stuff all of this should be pretty much old hat the other kind of caveat is that I'm certainly not the developer of read askew but I certainly think you

should go and check it out it's a fantastic module that's really useful for doing this kind of stuff so just to give you a quick outline of this stuff that I want to talk about today I'm gonna give you a quick rundown on some of my thoughts on

scientific simulation talk about some Python tools for doing that kind of stuff talk about Redis and Redis queue but probably more importantly I just wanna for those of you who don't do a lot of this type of work I want to make it very clear what the difference

between parallel simulations and distributed simulations are so the picture up the top is very much the parallel case where you have a single Python process that will spin up a number of worker nodes and that's all running on a single machine in a single

physical machine the second diagram is very much the distributed simulation case where you've got a single Python process that spins up multiple worker nodes and some of those nodes could potentially be on a physically different box give you a quick rundown

on some of the things that relate to building robust simulations although there's probably not time to do too much of that and talk about some of the caveats of solving this problem the way that I have so I'm lucky to have the position that I do I

get to work with some absolutely amazing scientists and they work in a huge number of fields I work with chemists biologists physicists mathematic mathematicians engineers and a few computer scientists not too many but the one thing that they all have in common

is that they solve big problems not necessarily computationally big problems all the time but they're certainly big science problems one thing that I have learnt is that scientists really just want to do science and they certainly don't care how I

help them when we collaborate together get solutions to their problems which is fantastic for me because it lets me pick the tools that that help me solve my problems in the simplest way so what's a scientific simulation so this is the Wikipedia definition

simulation is the imitation of the operation of a real-world process or system over time so at the simplest level you can think of a simulation like flipping a coin and you can imagine some kind of things that you might want to observe in such a simulation

like how often does it come up heads or on average how many coins do I need to toss before I'll see three heads come up in a row at the very other end of the spectrum simulation covers things like say I want to spend several billion dollars building a

Large Hadron Collider you know I really want to know before I go building something like that that it's absolutely definitely going to work okay so this class of solutions using red askew solves a fairly niche problem so we're really not interested

in the case where you can solve problems with a single Python process although Python is great for doing that already we've got tools like numpy side pie size and number and a whole host of other optimization of profiling tools we're not even interested

in that single machine case although that's mainly what my demonstration will focus on today Python ships with the multiprocessing module so battery is included and so there's not a whole lot of need to talk too much about that coming from a predominantly

academic background and I'm sure this people in the room who come from similar kinds of backgrounds I'm not interested in talking about supercomputers either most supercomputers rely on very specialized scheduling tools and and and job queues and at

least from my perspective a lot of the problems that I need to deal with it's not really appropriate to solve them in the cloud mainly for business reasons that relate to IP or security issues so this is kind of the case that I find myself in a lot of

the time we've got about 20 to 50 cores on something like 5 to 10 physical machines so if you've got a rack you can imagine that's about one rack worth of gear or you know if you even if you've got desktop machines now you can imagine that

fitting on one or two desks all right so what's out there for doing this in Python so has anybody used live Python parallel okay one person if you're a science person the absolute first place to go to to do any parallelization if you're an experienced

[ ... ]

Nota: se han omitido las otras 2.502 palabras de la transcripción completa para cumplir con las normas de «uso razonable» de YouTube.