Spawn Background Processes in Rails
Scott Persinger had a blog post about how to do background processing in rails. I have often wanted to have a simple way to do this (usually when sending out emails to users). So I decided to make a plugin out of Scott's idea. The plugin is called spawn and you can download it here (unzip it in your vendor/plugins directory).
Usage of the plugin is very simple. Just surround the code you want to run in the background within a 'spawn' block like this:
spawn do
logger.info("I feel sleepy...")
sleep 11
logger.info("Time to wake up!")
end
If you run that in a controller you should find that your page is rendered well before the last log message is written out. That's all there is to it.
You might want to be aware of what's going on when you run a spawn block. Because ActiveRecord::Base only keeps single connections open to the database, spawn will open a new connection in a forked child process. This seems to work pretty well in my application. The overhead of forking is low compared to starting a new rails process and you have access to all of the current variables.
Let me know if you try this out and how it goes for you.
Update 2007/10/16
The latest version of the spawn plugin can now do forking or threading. It is now controlled at rubyforge so read more here.
Update 2008/12/28
The latest version of the spawn plugin now works with Rails 2.2.2. Get it at github using "./script/plugin install git://github.com/tra/spawn.git".

30 comments:
Nice. Two questions:
What do I do to make sure that new AR connection closes when I'm done? To make sure that new spawned process shuts down when it's task is done, instead of waiting as a Rails app for more requests it's never going to get! (Is that all taken care of for me?)
What if I want to spawn a few processes, but I actually _want_ to wait for them to complete before returning a response? Is there a way to do that?
Re: jrochkind
The plugin closes the connections and exits when it's done. You shouldn't have to do anything.
Currently there's not way to wait for spawned processes. Perhaps I could add an option for that... I'll think about that.
If you added that as an option, I'd switch from using Rails native threads to this in a heartbeat. For reasons I explained over on my blog where you commented (http://bibwild.wordpress.com/2007/08/28/threading-in-rails/), I sometimes need to wait for the concurrent process to end.
But doing this using spawned processes instead of threads, I think I wouldn't need to jump through all those hoops I jumped through to get Threads to work in Rails, I suspect it would Just Work.
Here's what I propose. Currently spawn will detach from the child process. How about if I return the PID and give you an option to not detach and you can then do a wait on the child in your code? Something like this:
pid = spawn(:detach => false) do
something
end
wait(pid)
If you want, email me to discuss further (I just enabled showing email in my profile).
OK, version 0.2 is out and now gives you the ability to wait for the children processes that you spawn. It works like my previous comment; read the README for more details.
Download v0.2 here.
I'm not having any luck with this.
Using Postgresql with spawn 0.04.
Receving this error:
PGError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
: SELECT a.attname, format_type(a.atttypid, a.atttypmod), d.adsrc, a.attnotnull
FROM pg_attribute a LEFT JOIN pg_attrdef d
ON a.attrelid = d.adrelid AND a.attnum = d.adnum
WHERE a.attrelid = 'blogs'::regclass
AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum
Here's the method that's using spawn:
def send_alerts(subject, message)
subscribers = guests.find(
:all,
:conditions => "accepted_at IS NOT NULL AND alert_me"
)
subscribers << user unless subscribers.include?(user)
spawn do
subscribers.each do |to|
Mailer.deliver_blog_alert(to, subject, message, self)
end
end
end
It hasn't been tested with postgresql yet. Anyone had any success with that?
Great plugin.
However I still experience problems with long running tasks.
After completion of the task, I record in a log table that the task is completed. However, any ruby statements that are executed AFTER the main (http) thread has returned, do not run - ie. no log record is inserted. Statements that are initiated prior to the main (http:) thread returning are fine.
This behaviour continues even if I put a "ensure" block around the code.
I get the same behaviour in both Webrick and Mongrel on a win32 PC.
Does anybody else have the same problem?
nick, i don't suppose you could use the forking? i tend to trust it more since it's a separate process that can't be affected by the parent messing with connections... perhaps some sort of further protections could be built in but right now i don't have much time to do that. Of course it's open-source so contributions to make it better are welcome (2.times{wink} 2.times{nudge}).
Tom,
The plugin is awesome! Here's one suggestion, prompted by another response on my original post about how a forked mongrel process will delete it's parent's PID file.
I've added the following little monkeypatch to the child block:
# Monkey-path Mongrel to not remove its pid file in the child
require 'mongrel'
Mongrel::Configurator.class_eval("def remove_pid_file; puts 'child no-op'; end")
--scott
Scott,
Thanks for your feedback.
The plugin calls exit! in the child process which doesn't call the at_exit handlers. And mongrel calls remove_pid_file in the at_exit block so it should be fine. I tested to make sure and the child process does not currently remove the pid file as expected.
Please do let me know if you see otherwise so I can try to reproduce your configuration and fix it if there's a bug.
~Tom
Seemed great.. But definitely doesn't work in PostgreSQL 8.3.
Any thoughts on where I can look to see what is causing PostgreSQL to drop the connection?
Thanks
Mike
Just doing some testing and found that if I comment out line 73 in spawn.rb
ActiveRecord::Base.remove_connection
That everything works fine but I'm not clear on what is happening in PostgreSQL. Would you expect database connections to pile up if you continued this way?
Thanks for any help.
Just a quick follow up in case anyone is interested but for some reason PostgreSQL doesn't like remove_connection being called on it so in the fork_it() method in spawn.rb i modified the yield to look like this:
begin
# run the block of code that takes so long
yield
ensure
#modified this to support PostgreSQL.
ActiveRecord::Base.connection.disconnect!
ActiveRecord::Base.remove_connection
end
caling disconnect! before remove_connection seems to alleviate the problem with PostgreSQL blowing up with the "connect closed unexpectedly" error.
Thanks for the info mikee. I will look into adding this (or something like it) when I find some spare time.
How about maximum number of spawns that one can make? Any code written to queue to the background tasks in case all of max spawns were in action?
This would be a good to have feature, if scale is a concern.
Hi, I was just wondering, if I use spawn to send e-mail (just as You said) and the mailer is not requesting the database at all, do I have to worry with ActiveRecord connections? Couldn't I use Threads without worries?
Cause I was thinking in Monkey Pacth ActionMailer and set thread e-mails as a default behaviour. I will probably release as a plugin also! =)
What do You think?
And thanks! The spawn plugin are helping a lot to accomplish that!
I tried using spawn with mod_rails without success. Any ideas?
This runs within the request response cycle:
spawn do
code
end
This does not run:
spawn(:method => :thread) do
code
end
EXCELLENT plugin, clean and simple to use, did exactly what I want. Thanks!
One quick question. If an exception is raised inside the spawn, will I be able to catch it outside of the spawn or see it any other way?
For example, will the Exception Notifier plugin catch exceptions that are raised inside the spawn?
StartBreakingFree> you can catch exceptions inside of the spawn block but your parent process cannot catch exceptions because the exception occurs in a separate (child) process.
Yep I discovered this the hard way today. Looks like i had some exceptions going un-caught and the exception notifier was't grabbing them.
Makes sense why now but i didn't think of it up front. Might be worth adding a note in the documentation?
Anyway, thanks for the plugin it has been useful!
Brian
Hi,
Can you please update the Spawn plugin to Rails 2.2?
Connection pooling breaks Spawn
Thanks
I switched to this:
http://github.com/imedo/background/tree/master
It's running fine so far under 2.2.2 for me.
I'll try to upgrade it for 2.2.2 when I have time... of course, contributions are welcome since it's open source.
The background plugin doesnt work for me either. No 2nd process gets created, and it reverts back to the processing in-process.
Upgrading to 2.2.2 also broke spawning for me.. 2.1.2 works great still..
I created a patch for it. Perhaps the patch can be changed to use conditional code and work with both rails 1.* and 2.2.*
http://rubyforge.org/tracker/index.php?func=detail&aid=23316&group_id=4646&atid=17940
Surendra, your patch didn't work for me -- it doesn't seem like it would work because ActiveRecord still goes through the connection manager separately.
My patch actually resets the connection pools to avoid any shared connection issues. I've submitted a Pull request to Tom as well on github.
Works fine for me now with this fork:
http://github.com/garrytan/spawn/tree/master
I have merged garrytan's changes into github (will backfill on rubyforge next).
Your plugin works great in production environmet. But in development environmet I have exception: ArgumentError: A copy of Line has been removed from the module tree but is still active!
Line is a model. I call in controller:
spawn do
Line.addall table
end
Windows Vista RoR:1.8.6/2.1.0. Any suggestions?
Post a Comment