Tip:
Highlight text to annotate it
X
>>Ninh Bui: So hi guys. We're Phusion. We're Ruby Vien and Ruby Web Application Deployment
Shop in Amsterdam. Today we're gonna talk a little bit about
building a more efficient Ruby Interpreter. With me here today is Hongli Lai. I myself
am Ninh Bui. But we first, before we go on to talk about
building a more efficient Ruby Interpreter, it might be interesting to talk a little bit
about Ruby itself. So for those who are not familiar with Ruby,
it's a dynamic language, highly, it's actually highly dynamic, strongly typed. You can, you
have closures and other features like that as well.
It's resembling to Python, somewhat, you could say; and it's still a growing language. By
the time of this presentation, actually, there are, there's estimate, there's an estimate
about 400,000 Ruby programmers out there, and they expect this number to grow to 5 million
by 2013. So yeah, so it's still growing rapidly and
we believe that this is in part actually to thank to Rails, which is a popular web application
development framework which allows you to create, well stunning web applications in
a multi-view controller manner. So there are several Ruby implementations
out there, and the main one, or the best well known one and most widely deployed one, is
actually Matz's Ruby Interpreter. Also abbreviated as MRI. It was named after its creator, Yukihiro
Matsumoto. And even though it does a great job in many things, it has some issues though.
It has a reputation of being slow - [laughter]
like some other virtual machines. And it has a reputation of being a memory hog, basically.
So we wanted to make Ruby better. And in particular we wanted to make it better for servers because
we enjoy web development and we enjoy working with Rails.
So in 2007 we started hacking on the Ruby garbage collector in order to at least make
it more efficient for budget servers that we had to use back then. Nowadays that's a
totally different story. But, so yeah, we started hacking on the garbage
collector, and this resulted in a series of patches that we eventually molded into a useable
product. And, this useable product is called Ruby Enterprise Edition.
And today Ruby Enterprise Edition includes various patches from other contributors as
well, who share this passion on making a MRI suitable for server environments.
So in essence you can basically consider Ruby Enterprise Edition as being an MRI optimized
for server environments. So now let's talk a little bit about actually
tweaking the garbage collector, and I'd like to give Hongli the honors to do that.
>>Hongli Lai: So hi, guys. So our initial motivation for making the Ruby
garbage collection copy-on-write friendly, is to optimize the memory usage of Rails.
But before we discuss that let's see how, how Rails applications work.
So actually Ruby, or actually Rails application has an architecture that resembles something
like this: there are multiple Rails processes and there's a single web server, or many multiple
web servers and load balancers or proxies in front. And the web server, it will forward
HTP requests to one of the Rails processes. So here we see it working.
Then one of those Rails processes will generate a certain response and then this response
is then sent back to the web browser. And if you look inside the Rails processes then
you will see that each Rails process is actually single-threaded and handles one concurrent
request. So if you want to achieve true concurrency,
then you have to run multiple Rails processes. And this is actually no longer strictly true,
because since Rails 2.2, Rails has become thread safe. So these days you can actually
run multiple threads in a single Rails process. But what we talked about here is still true
for most setups out there. And if you look at how much the single "Hello
World" Rails application weighs, and it actually weighs about 25 to 30 megabytes, and memory
usage increases linearly with each process. So if you want high concurrency and you have
to spawn lots of Rails process, then memory usage can quickly add up.
So we looked at how we can reduce memory usage while make it less bloated. There are several
ways to do that. For example, you could optimize Rails, or you could optimize Ruby.
But both of these ways are very tedious; you have to put a lot of work in them; so you
try saying: "Maybe you can cheat. Maybe you can use the fork system code and copy-on-write."
And that works like this. Suppose that you have a parent process with some data, say
a variable A with value 42. And then you fork a child process. And then, initially that
child process will share all of its memory with the parent process. So if you access
the value of A in child process, then you're actually referring to the same memory page
as the one in the parent process. And when either the parent or the child writes
to that memory page, then that page is copied by the operating system and then written to.
We say that this page is made dirty and the child now still shares most of its memory
with the parent, but not that one memory page that it just wrote to.
And our past experience with dynamic languages, mostly Perl, has shown that most memory is
actually occupied by storing code; for storing the Prior Art tree for example.
And while Perl already uses copy-on-write to save memory, and it works like this: it
logs as many Perl modules in the Apache parent process as possible; then when Apache forks
child processes the memory occupied by those pre-loaded Perl modules will be shared among
all Apache worker processes. [pause]
And then we thought: "Can we do the same thing with Rails?" Well that really depends on how
much memory in Rails application is occupied by actually storing Rails code. So let's see.
"Hello World" in Ruby – it's about half a megabyte of memory, so its RSS is one to
five megabytes and about one megabyte of that is shared memory, so we end up with about
half a megabyte of actual unshared memory. And if we load in the Rails libraries, that's,
then we end up with a process that eats about 25 megabytes and that's already with shared
memory taken into account. So we had already measured how the "Hello
World" Rails application eats about 25 to 30 megabytes, so it seems plausible that we
can use copy-on-write to save memory. Well, but, there's a problem, because unfortunately
Ruby's garbage collector – it's not quite copy-on-write friendly. And every time that
Ruby runs its garbage collector, then old memory pages will be written to and that causes
copy-on-write. So if you take a look at Ruby's garbage collector,
it's actually a simple mark and sweep system and here's an example. Suppose I do create
a full object and then you refer to it from a local variable which is part of the root
set. [pause]
And then you create bar object and you change the reference to bar. Then when the garbage
collector is invoked, the garbage collector will follow all pointers that's reachable
from root set and then it marks all objects that it encounters while doing this. And this
marking is done by setting the FL_mark flag in, on a bit field inside the object.
And then next comes the sweep phase of garbage collection, and in this phase, garbage collector
will free all objects that are not marked because apparently you cannot reach them anywhere
from within the program. And the thing is, setting this FL_mark thing
inside the object, it will actually make the entire memory page of that object dirty, because
it writes to that page. And everything that's nearby is affected too.
So the Ruby ac notes, they are also garbage collected and so they too will be copied when
you run the garbage collection. And the fix to make sure that this doesn't happen is to
move the marking data away from the bit field inside objects and to a separate memory region,
like a Martin table. This sounds easy, but actually trickier than
one might think. And if we go back to the example with the processes, then you see,
for example, a parent process and a child process and they refer to some bar data. And
this, the bar data could be Rails AST notes. Then, whenever the garbage collector runs,
the garbage collector will mark all those AST notes and this causes copy-on-write, so
you end up with a copy of bar data, data that's actually identical. You don't really need
to create this copy, it just wastes memory. [pause]
And it's all because of this FL_Mark thing. So when trying to make the garbage collector
copy-on-write friendly, we, we encountered some caveats. For example, like how to measure
the dirty memory pages in the first place. Because on Linux we have this proc/self/smaps,
and this is a virtual file made by the kernel allows us to inspect a process, total private
dirty memory. But there are no tools to see which individual
pages are dirty. And other operating systems are even worse, because they don't even seem
to allow viewing your total private dirty memory usage on the processes. So reducing
the dirty pages involves a lot of guesswork. [laughter]
And we used the following test script to measure the effectiveness of copy-on-write, because,
what this test script does is, it first loads Rails and then after loading Rails it runs
the garbage collector. So that whatever garbage that was created during all this time - and
that garbage is not freed in the child - and then it forks a child process and in child
process it runs the garbage collector again to see whether the garbage collector is copy-on-write
friendly, and then it measures the process private dirty RSS.
[pause] So during our first attempt at making garbage
collector copy-on-write friendly we did not know how much effect it was going to have.
We, we initially used a hash table to implement the mark table and everything that is in this
table is considered marked. Luckily Ruby had a built in shading hash table
implementation, so we did not have to write our own.
And it took a while to make it all work, but we eventually succeeded and to our surprise,
it not only saved about 15 to 20 percent memory in the child process, though we were a little
bit of disappointed and the garbage collector also became about 30 to 40 percent slower.
[pause] So we tried to optimize this thing and it
turns out that the mark table itself is very large and we did not expect this. When Rails
is loaded there are about 150,000 objects even after garbage collection.
So each hash table entry occupies about four words and on x86 this is about 16 bytes. And
if you count in malloc overhead too, that's eight bytes, then you end up with 24 bytes
per entry. And multiply, multiply that, by the number of objects that we have, then we
get about three and a half megabytes and the hash table's bucket allows a depth of five
before resizing. So if we count in that overhead too, then we end with about this number, 3.7
megabytes and that's just to mark the objects. And what's more, this 3.7 megabytes consists
of many small objects so this, all this memory is actually not returned to the operating
system when you free them. It's not done by malloc.
So the result of all this is whenever you run the garbage collector, it generates 3.7
megabytes of dirty pages. And then we realized, yeah, we did not really
need a full hash table, because we just want to know when an object is marked, so we just
need a set. [pause]
And a hash table entry consists of these members: you have hash key, record, next. And we can't
get rid of record because we were only mapping object addresses to true.
And hash is only used to speed up hash table resizing so we can get rid of that too in
return for making resizing a little bit slower. And then this new data structure, we call
it PointerSet. And entries now only 16 bytes on x86 and that's including malloc overhead.
So if you multiply that by the number objects, then, and count in the bucket overhead, too,
then we end up with about 2.4 megabytes of memory.
So the hash table, I mean sorry, I mean the, the garbage collector now only use 2.4 megabytes
of memory for the Martin table. And the garbage collection speed did not change, but the copy-on-write
efficiency went up to about 30 percent. So this is definitely an improvement. But 30
percent is still a bit, maahh, not very good. So we tried to optimize this even further.
And if you recall that all the set entries are allocated with malloc, and we know that
there are, there's a lot of them, then we thought: "Well, maybe you can optimize this
by using a memory pool." By using a memory pool, we not only get rid
of the malloc space overhead, but it also allows fast allocation because each entry
in the pool has a constant size. And this allows our pool to use a simpler algorithm
than malloc does. And the pool is allocated with mmaps so that
old memory can be released back to the OS and the results were pretty encouraging because
copy-on- write savings went up to about 40 percent, and we even got a 15 percent performance
improvement. So this is really nice. But, yeah, we were not satisfied, yet. Because
we thought: "Yeah, this can be better." And then we thought: "Hum, maybe we can save even
more memory by not using a set but actually a bit field as a mark table."
[pause] So if you look at how Ruby objects are stored
in memory - Ruby objects they are allocated on so-called Ruby heaps. And Ruby heaps are
not the same as the system heap that malloc uses, but Ruby heaps are themselves allocated
on the system heap. So a Ruby heap consists of multiple equally sized slots and each slot
is capable of storing a single Ruby object. And we could add a bit field at the beginning
of each Ruby heap and this bit field is then used as mark table. So this bit field would
have the same number of bits as the number of slots in the Ruby heap. And a one in the
bit field just means that a particular object at that location and that heap is marked or
not. So not all slots are occupied by objects.
There were about 250,000 objects in our test script after loading Rails and after running
the garbage collector. And, but even so, altogether the bit fields only consumed 31 megabytes,
sorry, 31 kilobytes of memory and that's a far, far cry from all previous attempts.
With this, copy-on-write savings went up to 70 percent. And garbage collector performance
also improved by 60 percent compared to previous attempts. So this is a lot better.
[pause] And the 70 percent savings that we saw in
our test script, is a theoretical maximum because real life applications, they usually
create a lot of garbage for their own stuff, so the savings in practice are a little less
spectacular. At this point, we were able to save about
15 to 20 percent memory in real life applications, on average, against a five percent overall
performance head. But we can do even better, because there's
still some dirty pages left; but where are they? They are very hard to find. And after
a lot of pulling hair out of our heads and stuff like that, it turned out that the glibc
memory allocator was partially to blame. That's ptmalloc2.
Because if you take a look at this C code, it allocates one kilobyte of memory. A real
child process initially has a private dirty RSS of 125 kilobytes. But after executing
this code then the private dirty RSS suddenly jumps up to seven megabytes. Something is
definitely wrong here. And we suspect that ptmalloc2 makes a lot
of internal bookkeeping structures dirty, and that's what causing all these dirty pages.
So we researched other memory allocators that we could use. We tried nedmalloc first, but
we couldn't get it to work. Then we tried jemalloc. This is the memory allocator used
by FreeBSD and Firefox. But this did not seem to reduce the memory pace at all.
So eventually we settled for Goggle's tcmalloc and with this, practical copy-on-write savings
went up to 33 percent. And overall performance it went up about 20 percent. So it's even
faster than normal Ruby. And we concluded that tcmalloc is faster than ptmalloc2 for
our workloads. The garbage collector is actually still slower
than normal Ruby, but because of the allocator we still have an overall positive performance
difference. [pause]
And the 33 percent memory savings and practice is confirmed by other parties as well. For
example, shopify, an ecommerce shopping cart service. They testified that they saved a
lot of hardware resources with this. And we've also integrated this copy-on-write technology
with Phusion Passenger, a Ruby web application deployment software.
If you use Phusion Passenger with Ruby Enterprise Edition, then it will automatically use the
copy-on-write savings. So to sum up, by using a bit field as a mark
table for the garbage collector, and by using tcmalloc as memory allocator, we were not
only able to save 33 percent of memory on average in Rails apps, but we were also able
to make Ruby about 20 percent faster. And this concludes our part about the garbage
collector. Next up is Ninh about threading improvement.
[applause] >>Ninh Bui: So next I'd like to talk to you
guys a little about, a little bit about improving threading performance. And this is actually
a patch that was contributed by Joe Damato and Aman Gupta, who are the authors of the
EventMachine which is a io library mainly used in Ruby.
And this is actually a good example, we believe from a patch or contribution that allows us
to optimize MRI for server environments. So first off, let's go over a bit on, you
know, the threading model of Ruby 1.8. So as you can see, this is a typical green thread
setup, where you have n userspace threads all mapping to one kernel thread. And this
has some advantages as well as disadvantages. One of the biggest disadvantage of this in
this multi-core era, is probably that the, the userspace threads can't, can't utilize
symmetric multi-processing. Ruby 1.9, however, uses kernel threads. But
unfortunately you have a global interpreter lock which still prevents you from using multiple
cores. You know, Implementations such as Mac Ruby,
however, have been able to remove the global interpreter lock allowing for true utilization
of the hardware. And if you want to know more about that you
should probably speak to Loren Cincinnati, who is sitting over there. He can tell you
probably a lot more about this than I can. So as for the scheduling, the scheduling is
a, is a pre-emptive scheduler, which basically means that every userspace threads, thread
gets a certain amount of time to complete its task before it gets preempted, and another
thread will be scheduled in its place. And this is done either through an itimer,
a signal, or through a, a timer thread. It depends on the platform that you're probably
using. Another important thing about the Ruby 1.8
scheduler is that you can apply explicit yielding. So basically what you can do is if one userspace
thread is executing, you can actually explicitly yield your execution. So you can force a context
switch by using thread dot pass. So because we're using userspace threads,
however, if we do a blocking call in one of these userspace threads, we are actually blocking
the entire kernel thread as well, and basically blocking all the userspace threads here as
well. And we need to solve this, in particular for
IO, but we'll get there in a few seconds actually. So to illustrate this, if you have a blocking_call
for example on the left hand side thread over there, then all the other userspace threads
will have to wait until that blocking_call finishes. And this may, may or may not be
forever. So basically you are blocking the whole system.
So this is important for IO to, to fix for IO actually because, you can have a certain
read system call which has to wait, for example, for data to come in.
And Ruby solved, solves this by using non-blocking IO. And the way it does that is when it detects
a blocking operation, for example, if you are reading from a file descriptor where there's
no data to be read from, then basically it will just schedule another thread for execution
in its place. So like I said, there are some advantages
over using userspace threads over kernel threads. It's a good weigh off that you need to make.
And some of the advantages in this scenario are actually that they're fast and they are
cheap to spawn because everything happens in userspace so there's no kernel involvement
at all. This also means that you can shut them down
as fast as you want. And basically you can, you have a granular control over the context
switcher as well, because you have to implement this thing yourself and, or you can make it
as fast or slow as you want, basically.
But this is not particularly the case for Ruby, unfortunately. And this is what the
guys from EventMachine found out when they worked on improving this.
Because EventMachine allocates a, some data on the stack to read-and-write from, and they
noticed that the larger the stack became the, the harder it became to context switch. Actually,
the context switch actually became slower. And, as a analogous example, I've made, I
put up here a C Function that basically should illustrate what the problem was that they
encountered. So first off, you see here that we allocate
50 kilobytes of data on the stack and we use memset here just to zero fill it so that GCC
doesn't perform any optimization such as removing this piece of data, because it's unreferenced
in later code. Basically what this, this C Function does
is basically just allocating 50 kilobytes on the stack and invoking the, the block that
you give it, as you can see here. So if we were to try to invoke the C extension
function inside Ruby, in this scenario with threads, then we see that here we allocate
50 kilobytes on the stack first; we do some silly calculations for 200,000 times and after
each iteration we explicitly force a context switch; then joining all these threads takes
about 13 seconds. Now the interesting part comes in, in play
when we actually remove the invocation of the C extension method, thus actually removing
the execution of, or actually, the allocation of 50 kilobytes on the stack.
Then suddenly the execution time drops down to 4.2 seconds. And this is actually a weird
correlation going on here and definitely something going on with regards to allocating stuff
on the stack, and what for, and what consequences it has on context switching.
So profiling the results in an EventMachine scenario with threads, which is very similar
to the example that we just showed you with Google perftools, you get something like this.
And here we've ordered it in, in order of, you know, frequency of invocations.
So first off you see the Ruby time slice handler; nothing weird going on there. At some later
point apparently they're doing a hash table lookup, then the Ruby scheduler, and at some
other later point you get your Ruby AST interpreter. This is not weird at all. However, this guy
over here is. Because apparently at some particular point it needs to invoke memcpy and in order
to understand what it's using it for, we better take a lot at what happens when a context
switch takes place. So basically when a context switch takes place
it needs to store the state of the current thread, basically.
So what it does first is to save to the CPU registers with setjmp. Then apparently it
saves the stack frames to the heap. So basically it's taking the stack frames from the C stack
and it's copying it actually to the heap. That's kind of a WTF, but we'll get to that
a little later. At some later point it will save some VM globals
and for restoring it's pretty much doing the reverse of this. So, it's restoring the VM
globals; it's restoring the stack frames using memcpy, by the way; and then it's restoring
the CPU registers. And this has some implications actually, because
it's using memcpy to copy the stack data from, from and to the heap. And basically what this
means is if you have larger C stack, then a context switch will take longer.
So that's actually the symptom that we saw over there when we had this 50 kilobyte stack;
then you saw that it drastically increased the context switch time.
Now Ruby 1.9 is unaffected by this problem because it uses native threads, but Ruby 1.8
is still the most widely deployed version at this time.
So we wanted to solve this. So in order to understand how, how this can
be solved, you need to see how Ruby uses the stack. And an important element in that is
to know that the Ruby stack frame is basically just the C stack frame; it's sharing this.
And in particular you can determine what the stack frame size of this thing is by, for
example, using gdb and substracting the base pointer from the stack pointer. That gives
you the frame size. Every function call, as a result, will put
roughly around one kilobytes on the stack. So if you have some Ruby code on the left
hand side, it will allocate one kilobyte for each of these implications on the stack.
And it's important to fix this, we believe, because a typical real stack trace looks kind
of like this; and it's pretty much 65 levels deep, basically. So if we can solve this problem,
then for future versions that may use threading, for example, inside Rails - and as you can
see Rails uses the stack extensively - then context switching should be much faster.
This has some unexpected consequences during context switches, as well, because basically
what this means - because the stack contents can change ad hoc during run time - this means
that it's very unsafe actually to refer to a value on the stack from a native thread,
because at a context switch, the context can, or actually the, the stack can be changed.
So you can see that here, for example, if we have a value on stack here in this function
and we were to pass it, pass its address on to a thread function and spawn the thread,
then it would not be very safe to refer to that value because when the context switch
takes place, its contents could change drastically. So the fix for this is of course, to instead
of copying the current stack from and to the heap - like so - we can just not copy that
and just change the current stack pointer. So upon a context switch you just basically
change the stack pointer to the stack that you want to use to the heap. But this requires
some platform specific code. [pause]
And before we dive into that, it's good to do a little refresher course here. Because
on x86_64 you have to remember that the heap grows from, grows upward. So from low address
numbers to high address numbers. And the stack, however, grows from high address numbers to
low address numbers. So as you can see the heap grows upwards whereas
the stack grows downwards. So if we were to allocate some memory on the
heap, then actually the result using either malloc or memorymap will give us a pointer
to that address. And that address is actually the stack top.
This is a small implementation detail. And so to solve this, for example, for x86_64
you can use the following in line assembly that GCC supports. So as you can see below,
so the last two lines are actually the input output list, and we're referring to the input
output list in the assembly string in a tokenized manner, such if you are familiar with using
print at, for example. So percent zero will refer to that value and
percent one will refer to the other value. So what this code does basically is it's,
it's moving the stack pointer to the value of stk_base which is kind of - we'll be able
to discuss a little bit about that later, if you have some questions on that.
And eventually we'll invoke a, it will invoke the function actually, rb_thread_start, the
address of rb_thread_start with this new stack. So there are some caveats to this actually,
because native stacks grow automatically. Our operating system usually takes care of
this, so it grows as it needs to. Our stack doesn't, however, because we're
allocating this on the heap and we need to manage this ourselves.
So this raises a question, like how large must our thread stacks be? Also, how do we
allocate these thread stacks? Do we use memorymap or do you use malloc? And lastly, how do we
handle stack overflows? Because it's very easy to fall off your stack, for example,
and all bad things could happen, basically. So this led to some decisions and some special
cases. First off, it was decided upon that the Ruby thread stack, thread stack size would
default to one megabyte which is about -1000 function call depth deep, which we believe
is adequate for most use cases. The size for this is configurable during run
time for advanced users if they should need to do that.
Also the decision has been made to use memorymap instead of malloc so that we don't have to,
don't have to incur the malloc overhead that Hongli just talked about; and so that the
memory is guaranteed to be released back to the operating system as well.
As for stack overflows, we fixed this problem by putting a guard page actually with Prot_None
at the end of the stack to catch potential overflow. So if you try to read or write from
this guard page then you will get a segmentation fault.
As a result of that, as well, was actually that the signal handler must actually run
on a separate stack because a signal handler is just a function and can also reference
the stack, and it could also reference that Prot_None page basically that caused the signal
handler to fire in the first place. So you need something like sig_alt_stack to prevent
that. So in terms of benchmarks to see what the
results of this is, we can use the alioth thread-ring test and instead of trying to
explain to you guys what this boring code does, it's probably better to do this through
a small animation. I love Keynote, by the way.
So what this benchmark basically does, it initializes a number to, for example, 50 million;
it then spawns 403 threads and it then runs all these threads sequentially in such a way
that will substract minus, or actually will substract one from number and it will continue
to do this until at some particular point, number will be zero.
And the next thread to be spun up will notice this, and for example, if this were to be
like, for example, thread number 13, then it will print out this number to the console
and it will exit. So the results of this benchmark are pretty
self-explanatory. As you can see, Ruby 1.8, the original version, takes about 1400 seconds
to execute this; whereas Ruby 1.9, takes about half of this.
The patch version, however, for Ruby 1.8, is very similar actually to the Ruby 1.9 version,
and so we can infer that there is a Delta of about two times, two point three times
faster there going off it. So the, things are definitely looking to shape
up here. But to truly understand how this improves the threading performance, or actually
the scheduling, we could spice things a little bit up by introducing this function which
just recursively calls itself for about a hundred times and yield the block that you
give to it. And as we already discussed earlier on, every method invocation that you do in
Ruby, allocates about a kilobyte on the C stack.
So basically what this code does it will grow the stack by about 100 kilobytes.
So we, when we rerun the test, sorry, with this, with this function that we just introduced,
then you can see that Ruby 1.8 will now take two hours to execute or actually to finish
this code. Whereas Ruby 1.9 will take 12 minutes, and Ruby 1.8 the patch version will, while
it's still similar to Ruby 1.9 actually so it's 13 minutes.
But as you can see from this example the Delta now is nine point four times so it's, it's
definitely a cool patch, we believe. So as for threading in its current situation
in Ruby 1.8 and in particular in Ruby Enterprise Edition, the patch that Joe and Aman have
contributed is currently in Ruby Enterprise Edition and it's available for you now.
And it's currently only available on x86, x86_64, but other platforms could be supported
provided that you give the assemblies and the proper stack heap growth calculus thingy.
So basically, in conclusion, Ruby Enterprise Edition is not a fork, but it's a branch,
so regular merging does occur with upstream. We have, contrary to Ruby proper, we have
a more liberal patch acceptancy policy, so if you guys have a cool patch that could contribute
to Ruby in a server environment, that we'd be more than happy to take it into consideration.
And eventually, of course, at one particular point we do really hope that these patches
will find their way back to upstream; but until that point we're, we're maintaining
Ruby Enterprise Edition. So there are some other interesting Ruby Enterprise
Edition patches as well, by the community. RailsBench, for example, and GC statistics
and MBARI patches in particular, which improve the, the conservative garbage collector of
Ruby. Ruby has a conservative garbage collector
which scans the entire stack to determine whether or not a value on that stack is a
pointer to something on the Ruby heap. The MBARI patch has actually optimized finding
correct pointers to the Ruby heap. And lastly, we have a nice one from Phillippe
Hanrigou; a colorful threads which will definitely help you when debugging.
So that's basically it, and if you have any questions, we'd be more than happy to try
and answer them. [pause]
>>Hongli Lai: Well, that's a surprise. >>Ninh Bui: Nobody's gonna ask if we're gonna
have beers afterwards. Because - [laughter]
Okay. Well, that's basically it. Thank you for listening and we hope you enjoyed it.
[applause] [techno music]