Guerrilla's Guide to Optimizing Rails Applications
The Battle For Performance
Rails is slow. Your rails application which does fairy simple database operations will be born slow and will eventually become slower and slower while still not doing anything fancy. That's like getting a new powerful locomotive (on Rails) together with a free bonus of overladen freight cars.
Having said that, I can virtually hear the herd of voices screaming "dude, optimize your own application and don't blame Ruby and Rails for everything". Sure thing, our application isn't perfect and I do believe it can be optimized but you can't win the performance battle just by that.
The winning tactics for this battle is guerrilla tactics. It turns out you can guerrilla-patch (aka monkey-patch) Ruby and Rails and easily get up to 3x speedup in some cases. We started doing that to our own application - Acunote and got amazing results so far. Read on to find out how.
Ruby-Prof: The Guerrilla's Weapon
Acunote, our pet rails application, often need to copy a bunch of ActiveRecord (AR) objects. As any other project management tools, our application works with tasks organized into iterations (sprints). Sometimes you want to copy tasks from current iteration/sprint to the next iteration/sprint. And boy, that copy can become really slow. As an example, let's look at the performance and profiling data of one request to copy 120 tasks.
Let's measure 3 things:
- the real time request takes in production environment;
- processing time profile information using the excellent ruby-prof profiler by Charlie Savage and Shugo Maeda;
- memory profile information gathered with patched ruby-prof (I took Sylvain Joyeux's memory allocations measurement mode patches and hacked together my own ruby-prof memory size measurement mode).
So, for "copy 120" request we get:
Request time (without profiler): 13.6 sec.
Time:
Total: 16.78
%self total self wait child calls name
18.36 3.34 3.08 0.00 0.26 20766 Kernel#clone
...
Allocated Memory Size (in KB):
Total: 1 001 730.00
%self total self wait child calls name
61.87 646795.00 619745.00 0.00 27049.00 20766 Kernel#clone
...
"WTF" politely describes my initial reaction when I first saw that. How on earth we can spend 17 seconds and consume 1 Gigabyte of memory by just copying 120 ActiveRecord objects? Ok, ok, we not only copy them but store some more information to the database like task tags, copy events and so on. Therefore we end up saving actually 360 AR objects instead of 120. Also the memory profiler is inaccurate and usually shows a bit more memory than the process actually takes.
Still, what is going on during those 17 seconds? Why do we need a gig of RAM? Gee, on the same machine I can compile the whole ruby interpreter in 57 seconds and the compiler will never take more than 130M...
Heh, let's calm down and see what profiler says. The obvious suspect is Kernel#clone method. The (stripped) call tree for it as reported by ruby-prof is
Task#save
...
ActiveRecord::Base#create_without_callbacks
ActiveRecord::Base#attributes_with_quotes
ActiveRecord::Base#attributes
ActiveRecord::Base#clone_attributes
ActiveRecord::Base#clone_attribute_value
Kernel#clone
From Charlie's blog we know that AR::Base::attributes method is evil. It clones attributes before returning them and Charlie gave a wise advice to not use it. And indeed in your application is a good idea to call, for example, task['foo'] instead of task.foo or task.attributes['foo']. But here AR::Base::create itself does the evil thing.
Let me cite Rails code (from rails/activerecord/lib/activerecord/base.rb):
def update
connection.update(
"UPDATE #{self.class.table_name} " +
"SET #{quoted_comma_pair_list(connection, attributes_with_quotes(false))} " +
"WHERE #{self.class.primary_key} = #{quote_value(id)}",
"#{self.class.name} Update"
)
end
def create
if self.id.nil? && connection.prefetch_primary_key?(self.class.table_name)
self.id = connection.next_sequence_value(self.class.sequence_name)
end
self.id = connection.insert(
"INSERT INTO #{self.class.table_name} " +
"(#{quoted_column_names.join(', ')}) " +
"VALUES(#{attributes_with_quotes.values.join(', ')})",
"#{self.class.name} Create",
self.class.primary_key, self.id, self.class.sequence_name
)
@new_record = false
id
end
def quoted_column_names(attributes = attributes_with_quotes)
attributes.keys.collect do |column_name|
self.class.connection.quote_column_name(column_name)
end
end
def attributes_with_quotes(include_primary_key = true)
attributes.inject({}) do |quoted, (name, value)|
if column = column_for_attribute(name)
quoted[name] = quote_value(value, column) unless !include_primary_key && column.primary
end
quoted
end
end
Great! AR::create clones all attributes twice (first with quoted_column_names and then with attributes_with_quotes). AR::update is a nicer guy, it clones attributes only once. For each of those 120 copied tasks we use AR::create twice and AR::update once. Therefore we call clone_attributes not less than 120*2*2+120 = 600 times.
Profiler says we lose 3 seconds and 650M memory because of that:
ActiveRecord::Base#clone_attributes:
Time: 3.21 sec
Memory: 650 301.00 K
Ok. Time to employ our promised guerrilla tactics, but first let's talk about memory.
Lyrical Digression: Why Do We Care About Memory?
While guerrilla warriors take a nap before the battle, let's digress and look at why we measure the size of allocated memory. Sure thing, allocating literally tons of memory is bad. But we're using Ruby and Ruby has its own guerrilla warrior which strikes our application when it innocently eats its tasty memory.
The name of our enemy is "Ruby Garbage Collector" and the cunning enemy it is. As ruby's gc.c source says and Why the Lucky Stiff explains, ruby GC kicks in when you allocate more memory than is available on the heap and heap boundary is 8M. When we sequentially allocate 650 megabytes there's a chance that GC will be called 650/8=81 times. Each GC call accounts for approximately 70-80ms which in our case should add roundabout 5 seconds to the running time.
Let's do some math... Kernel#clone itself takes 3 seconds and we expect garbage collection to take 5 more seconds. In total, 8 seconds are wasted because of cloning. See the trend? The more memory we allocate the worse our performance is.
Here's the morale: "watch your memory". Large memory consumption costs you more than you'd otherwise think because of garbage collection.
Guerrilla patching AR::create and AR::update
Now it's time to employ our guerrilla tactics and fix ActiveRecord. It's easy to rewrite attributes_with_quotes and avoid cloning the attributes:
module ActiveRecord
class Base
private
def attributes_with_quotes(include_primary_key = true)
result = {}
@attributes.each_key do |name|
if column = column_for_attribute(name)
result[name] = quote_value(read_attribute(name), column) unless !include_primary_key && column.primary
end
end
result
end
end
end
Now let's put that to environment.rb and see what we get:
Request time (without profiler): 6.7 sec.
Time:
Total: 9.5
%self total self wait child calls name
4.84 0.69 0.46 0.00 0.23 24023 ActiveRecord::ConnectionAdapters::Quoting#quote
...
2.22 0.52 0.18 0.00 0.34 2779 <Module::Benchmark>#realtime
...
0.00 0.00 0.00 0.00 0.00 6 Kernel#clone
...
Allocated Memory Size (in KB):
Total: 350 966.00
%self total self wait child calls name
34.31 125655.00 120420.00 0.00 5235.00 2779 <Module::Benchmark>#realtime
...
0.00 0.00 0.00 0.00 0.00 6 Kernel#clone
...
Ok, much better: 650 megabytes of memory saved with that little monkey patch! We estimated that will save 8 seconds. Let's look:
Original request time: 13.6 sec
Request time after AR patch: 6.7 sec
Time saved: 6.9 sec
It turns out our estimation was almost correct. We indeed saved not only the time taken by Kernel#clone but also the time spent by garbage collector and gained 2x speedup for free. Nice! Our guerrilla tactics clearly works and we can for sure do better than 2x.
Guerrilla-patching Benchmark.realtime
Next suspect, as seen from memory profile above, is Benchmark#realtime. From time profiler the method doesn't look bad. We spend only 180 milliseconds in it but look at memory profiler! It 180ms our Benchmark#realtime friend allocates 120 megabytes of memory. Guess what? Ruby garbage collector will be happy to kick our ass again at least 15 times and that estimates to about 1 second loss.
Let's see what's wrong in benchmark.rb:
def measure(label = "")
t0, r0 = Benchmark.times, Time.now
yield
t1, r1 = Benchmark.times, Time.now
Benchmark::Tms.new(t1.utime - t0.utime, t1.stime - t0.stime,
t1.cutime - t0.cutime, t1.cstime - t0.cstime, r1.to_f - r0.to_f, label)
end
def realtime(&blk)
Benchmark::measure(&blk).real
end
Great, we calculate lots of useless information (like cpu time, user time, etc) in Benchmark#measure and then just throw it away. Let's do better, simpler and more memory efficient:
module Benchmark
def realtime
r0 = Time.now
yield
r1 = Time.now
r1.to_f - r0.to_f
end
module_function :realtime
end
That's it. No extra object creation. And here's our income:
Request time (without profiler): 5.8 sec.
Time:
Total: 8.63
%self total self wait child calls name
4.29 0.69 0.46 0.00 0.23 24023 ActiveRecord::ConnectionAdapters::Quoting#quote
...
0.35 0.19 0.03 0.00 0.16 2779 <Module::Benchmark>#realtime
...
Allocated Memory Size (in KB):
Total: 225 668.00
%self total self wait child calls name
43.78 98805.00 98805.00 0.00 0.00 5280 Kernel#binding
0.00 1918.00 2.00 0.00 1916.00 2779 <Module::Benchmark>#realtime
...
Whoa! 120 megabytes and 1 second were saved with 9-liner guerrilla patch. Nice, isn't it? Instead of original 13.6 sec our request now runs in 5.8 sec. That's more than 2x improvement already and we can do even better, but I'll leave that for the next blog.
When Guerrilla Tactics Help
Nothing with Ruby and Rails is wrong until you do things in the loop. For example, Benchmark#realtime doesn't have any impact if all you do is save one AR object. You just allocate 25 extra objects that take 45 extra kilobytes. But you're doomed once you do that in a loop. Do the benchmark 200 times and you'll guaranteed the garbage collector will happily shoot you in the back.
Also bear in mind that garbage collector should not be running while you profile your code. With GC you'll never get repeatable results because it will kick in at different times randomly increasing the time and memory consumption of your methods.
Make Your Application Fast: Optimize MEMORY
Let me reiterate, by optimizing memory usage instead of (or in parallel with) processing time you can get significant performance improvement, and guerrilla patching memory-inefficient bits is the right way to make Ruby and Rails faster I think we're on the right track here. Memory is what we need to optimize. Memory is what wasn't optimized before. And you've just seen what we can get by optimizing memory.
We (Acunote devs) will continue the memory profiling and will blog as soon as we get new results. Our previous blog post briefly summarize our current results, be sure to read it if you haven't done that yet. Meanwhile, don't wait for us, grab ruby-prof, my memory profiler patches and dig by yourself ;) I bet you'll find more big memory eaters and when you do so, please drop us a note. Let's collect our guerrilla patches and then file them upstream.
Have these patches been submitted to the Rails core?
Posted by: Carl Youngblood | February 02, 2008 at 02:01 PM
Never mind. Looked at the previous post and got my answer.
Posted by: Carl Youngblood | February 02, 2008 at 02:07 PM
I really love seeing people optimising the Rails core like this. Fantastic.
Posted by: Alistair Holt | February 03, 2008 at 05:05 AM
i have been trying the new ruby-prof ver 0.6.0
but i could't c the memroy profiling in the log
i followed ur instructions in patching ruby gc and using the line RubyProf.measure_mode = RubyProf::MEMORY
but still unable to run memory profiling in my rails app :(
can u explain more how to do so?
Posted by: khelll | February 04, 2008 at 06:44 AM
I'm guessing ruby can be patched to raise the memory limit at which point its GC gets called?
Great stuff though -- looking forward to seeing some of this in Rails core.
Posted by: Shanti Braford | February 04, 2008 at 10:24 AM
Greetings,
It's great stuff, and certainly valuable, but GC pauses should never impact the user, since I believe you should be handling all page requests with the GC disabled, and only enable (and start) GC after you've disconnected from the user.
I believe FastCGI, for instance, does it this way.
It's definitely nice to get rid of unnecessary allocations, but if you're running Rails with GC disabled during the page request, I believe these suggestions probably won't net you much user-visible speedup.
I'd *love* to hear differently mind you, as I do, in fact, work on a large consumer Rails app. :)
-- Morgan
Posted by: Morgan | February 04, 2008 at 12:05 PM
http://rubyforge.org/pipermail/mongrel-users/2008-February/004997.html maybe related??
Posted by: Justin | February 06, 2008 at 02:41 AM
Usual deployment scheme for rails app is running several mongrels on one or more machines. In this case you can't disable GC. One of the lengthy requests processed in one mongrel can take too much memory and other mongrels will be swapping all the time. So the overall performance will go down. And like I described in the blog post, it's not that uncommon for rails process running in mongrel to take 1 Gig of RAM ;)
Things only get worse if you run on memory-restricted hosting, say VPS with 512M of memory. You basically need GC there.
What you can certainly do is to increase the memory threshold for ruby GC from 8M to 16M or 32M (read http://whytheluckystiff.net/articles/theFullyUpturnedBin.html for more insight). Thus you'll have fewer GC collections but each of them will take a bit longer time. Still, you'll need to manually fine-tune the setting to get the good performance.
Posted by: Alexander Dymo | February 07, 2008 at 11:36 AM
Hi, thanks for this great write up on ruby-prof. I'm following along and have rebuilt ruby with the patch included above. After changing my path and verifying that I'm using the patched version of ruby, I've found that this patched version of ruby can't find my installation of ruby gems:
irb(main):001:0> require 'rubygems'
LoadError: no such file to load -- rubygems
from (irb):1:in `require'
from (irb):1
irb(main):002:0>
Did you run into this issue?
Posted by: Tim Case | March 25, 2008 at 09:07 AM
Please check that you've installed the new ruby into the same location as the old one. In case you installed it into the new location, you'd need to install rubygems package and all your gems again.
Posted by: Alexander Dymo | March 25, 2008 at 09:53 AM
Thanks! That worked! I also found that if I symlinked gems from my new location's bin, it would also work with the new location. Thanks again.
Posted by: Tim Case | March 29, 2008 at 08:54 AM
Correction: My little symlink trick didn't actually work, but the solution you described above did.
Posted by: Tim Case | March 29, 2008 at 11:17 AM
Excellent stuff guys!
@Alex and @Tim:
I've got a patched Ruby in my /usr/local. (thanks to http://hivelogic.com/articles/2008/02/ruby-rails-leopard )
I also ran into the problem outlined by Alex. However I'm not sure I understand the fix of installing the new ruby "in the same location". I'm not too willing to actually overwrite my unpatched Ruby and I guess it's not what you mean ;-)
However just copying the (suspiciously small) /usr/local/bin/ruby executable under another name doesn't work either... I guess it's just a small executable that hardcodes the 'prefix' dir from the Makefile and therefore still loads everything from /usr/local/lib.
So could you explain in a little more details how to unobtrusively install this second version of ruby?
Meanwhile I'm going to try the lazy approach. I will install the same rubygems version as on my normal Ruby install, then make a backup of /usr/local/lib/ruby/gems and copy my current gems directory in its place.
I'll report on whether that works :-)
Posted by: Mathieu Martin | June 18, 2008 at 12:04 PM
Is this article still valid with Rails 2.1.0 ?
Posted by: anon | August 08, 2008 at 09:07 AM