Usando estructuras inmutables de datos en aplicaciones Ruby (RubyConf 2015)

Presentación

HTML (pincha para descargar)

Vídeo

Transcripción

Extracto de la transcripción automática del vídeo realizada por YouTube.

- Welcome, how's everyone's morning? Good? Cool. So my name is Brad Urani. This is a talk about immutable data structures. Immutable data structures are an idea from functional programming and I gave this talk because I got interested in this because I took a course on Coursera called The Principles of Functional Programming in Scala.

It's taught by Martin Orderski, who's the inventor of Scala. A very, very, very great course, highly, highly recommend you take it. It got me thinking about how we can do some types of functional programming in Ruby. And what do I mean by functional programming? Functional programming is the techniques that are used to isolate state change, basically, and it's what you get when you combine a number of techniques and also use some discipline.

It's what you do and also what you don't do. Ruby has a lot of the stuff that you need for functional programming built-in. Immutable data structures has that yellow squiggle there because we get that with a third-party gem. Higher order functions would be like passing lambdas around.

Partial function application, that's like currying. We don't do that a lot in Ruby. We do have lazy sequences as of Ruby 2. 0. Recursion, you can always recurse in Ruby. We don't have lazy evaluation and pattern matching. So, if you combine all this stuff, you kind of get functional programming, but the truth is most of functional programming is a poor fit for Ruby.

Ruby's an object-oriented language and I have tried to do pure functional programming in Ruby and you can do it, but I don't really recommend it, at least not in production. But these immutable data structures I thought were really, really interesting because they are a piece that we can take that I think do have some really, really interesting use cases.

And also just a really, really neat topic with some really cool ah ha moments in them. So we do have this immutability built into Ruby. We have this dot freeze method. So, we take this array, we call dot freeze, and then what happen is if you to change this array, if you try to update, you know, add an element or delete an element, this will raise an error.

The reason we have this is to prevent certain types of bugs. So, if you have this array and you're passing it to some function that the success is predicated on this not changing, you can do this to avoid certain bugs. If you jump into Rails source code, you'll often see string dot freeze, right? That's a performance enhancement.

It's a memory-saving device. Freezing an array does not have the same performance enhancement as far as I know. It's for, you know, not running into certain bugs. I guess that's useful. It's not really that interesting and it's not what this talk is about.

That's because this data structure is immutable, but it is not persistent. These are two different things. Persistent is a data structure with some special qualities and that's what I'm gonna be explaining today. We get persistent data structures with a third-party gem, it's called Hamster.

It's a really, really awesome gem. Here I've made a list just containing the items one, two, and three. Hamster is really, really awesome, it's this great gem, it's got this whole collection of persistent, immutable data structures. Is anyone here a maintainer or has anyone worked on it? Maybe not, well, if you're watching my video later, shout out to you, this is a great gem.

Great work. So we get this list, one, two, and three, and what actually is the difference here? Let me show you really fast. So, make sure this is big enough. All right, so if I make an array in Ruby, one, two, three. So there's l. l is one, two, and three.

So if I do l2 equals l dot unshift, zero, and unshift, if you don't remember, is actually the list operation that pushes something onto the front of an array, I get zero, one, two, three. Okay. What is l right now? What happens if I print out l? Is it one, two, three? Is it zero, one, two three? It's zero, one, two, three, isn't it? Why is that? That's because both of these references, l and l2, have a reference to the same array.

There's only one array in memory and they both point to the same one, so editing it in one place, changes it for all references that have it. Hamster is a little different, hamster list. So if I do l equals Hamster, List, one, two, three. There we go. All right, sorry about that.

So I have l, which is one, two, three, and if I say l2 equals l dot cons, and cons is the same as unshift, it pushes something onto the front. Why is it called cons? It's tradition from functional programming, dating back 40 years, but same thing. I con something on the front, I get zero, one, two, three.

What is l now? l is still one, two, three. So that's the difference here. That's the difference between just a normal array and a hamster list, is that when I edit Hamster list, anything with the reference to the original stays the same. It remains unchanged.

That's why it's immutable. So a persistent data structure is immutable, but immutable data structures are not necessarily persistent. Persistent data structures have these special qualities. So how do they implement that? All right, let's think about this for a second.

Say you wanted to make that Hamster list class yourself, how are you gonna go about and do that? Hm, well, the the first thing we can think of is, all right, when I update the list, I could just clone it, couldn't I? I could just make a whole dang copy right in my memory and then I would achieve what I wanted, right? No problem, that makes sense.

Interesting example from everyone's favorite language, PHP, right? Is that PHP actually does this. You take this array, which is set to two, three. You can actually swap references like that, like just set array two equals array one, then you can add four onto array two, and actually array one is still unchanged.

This is called copy on write. The PHP runtime is kind of smart. Oh my god, did I just use smart talking about PHP? Oh, what's wrong with me? But yeah, it actually clones things under the hood, keeps track of these reference and clones things under the hood for you.

That's kind of interesting, but it's not very efficient, isn't it? That's not very fast. You wouldn't expect that to be very performant. So, how would we implement that immutable array, that persistent array in a way that's performant, efficient, and fast? It turns out, a functional persistent list is actually a linked list.

Did any of you ever take an algorithms class or something, where you learn stacks and queues and trees and linked lists and you're like, wow, why would I ever used a linked list? Well, it turns out that they do actually have a use case and that's in this.

So we have this list. So if I do this, I take one, two, and three, and then I use cons to push something onto the front, you can see that it's persistent. I have one, two, three, and I have the new list. What I've actually done is this. So I have a single linked list and I have two references to the same list, to two different nodes in the same list.

That's actually awfully clever, isn't it? Because I didn't have to clone anything, the one, two, three part is shared by the two lists, only to the references list and new list, it looks like two entirely different lists. It looks immutable, doesn't it? That's actually really clever because it's fast.

[ ... ]

Nota: se han omitido las otras 3.562 palabras de la transcripción completa para cumplir con las normas de «uso razonable» de YouTube.