Thursday, September 13, 2007

Spawn Background Processes in Rails

Scott Persinger had a blog post about how to do background processing in rails. I have often wanted to have a simple way to do this (usually when sending out emails to users). So I decided to make a plugin out of Scott's idea. The plugin is called spawn and you can download it here (unzip it in your vendor/plugins directory).

Usage of the plugin is very simple. Just surround the code you want to run in the background within a 'spawn' block like this:


spawn do
logger.info("I feel sleepy...")
sleep 11
logger.info("Time to wake up!")
end


If you run that in a controller you should find that your page is rendered well before the last log message is written out. That's all there is to it.

You might want to be aware of what's going on when you run a spawn block. Because ActiveRecord::Base only keeps single connections open to the database, spawn will open a new connection in a forked child process. This seems to work pretty well in my application. The overhead of forking is low compared to starting a new rails process and you have access to all of the current variables.

Let me know if you try this out and how it goes for you.

Update 2007/10/16
The latest version of the spawn plugin can now do forking or threading. It is now controlled at rubyforge so read more here.

Update 2008/12/28
The latest version of the spawn plugin now works with Rails 2.2.2. Get it at github using "./script/plugin install git://github.com/tra/spawn.git".

30 comments:

jrochkind said...

Nice. Two questions:

What do I do to make sure that new AR connection closes when I'm done? To make sure that new spawned process shuts down when it's task is done, instead of waiting as a Rails app for more requests it's never going to get! (Is that all taken care of for me?)

What if I want to spawn a few processes, but I actually _want_ to wait for them to complete before returning a response? Is there a way to do that?

Tom said...

Re: jrochkind

The plugin closes the connections and exits when it's done. You shouldn't have to do anything.

Currently there's not way to wait for spawned processes. Perhaps I could add an option for that... I'll think about that.

jrochkind said...

If you added that as an option, I'd switch from using Rails native threads to this in a heartbeat. For reasons I explained over on my blog where you commented (http://bibwild.wordpress.com/2007/08/28/threading-in-rails/), I sometimes need to wait for the concurrent process to end.

But doing this using spawned processes instead of threads, I think I wouldn't need to jump through all those hoops I jumped through to get Threads to work in Rails, I suspect it would Just Work.

Tom said...

Here's what I propose. Currently spawn will detach from the child process. How about if I return the PID and give you an option to not detach and you can then do a wait on the child in your code? Something like this:

pid = spawn(:detach => false) do
   something
end
wait(pid)

If you want, email me to discuss further (I just enabled showing email in my profile).

Tom said...

OK, version 0.2 is out and now gives you the ability to wait for the children processes that you spawn. It works like my previous comment; read the README for more details.

Download v0.2 here.

Anonymous said...

I'm not having any luck with this.
Using Postgresql with spawn 0.04.
Receving this error:

PGError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
: SELECT a.attname, format_type(a.atttypid, a.atttypmod), d.adsrc, a.attnotnull
FROM pg_attribute a LEFT JOIN pg_attrdef d
ON a.attrelid = d.adrelid AND a.attnum = d.adnum
WHERE a.attrelid = 'blogs'::regclass
AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum

Here's the method that's using spawn:

def send_alerts(subject, message)
subscribers = guests.find(
:all,
:conditions => "accepted_at IS NOT NULL AND alert_me"
)
subscribers << user unless subscribers.include?(user)
spawn do
subscribers.each do |to|
Mailer.deliver_blog_alert(to, subject, message, self)
end
end
end

Tom said...

It hasn't been tested with postgresql yet. Anyone had any success with that?

Nick said...

Great plugin.

However I still experience problems with long running tasks.

After completion of the task, I record in a log table that the task is completed. However, any ruby statements that are executed AFTER the main (http) thread has returned, do not run - ie. no log record is inserted. Statements that are initiated prior to the main (http:) thread returning are fine.

This behaviour continues even if I put a "ensure" block around the code.

I get the same behaviour in both Webrick and Mongrel on a win32 PC.

Does anybody else have the same problem?

Tom said...

nick, i don't suppose you could use the forking? i tend to trust it more since it's a separate process that can't be affected by the parent messing with connections... perhaps some sort of further protections could be built in but right now i don't have much time to do that. Of course it's open-source so contributions to make it better are welcome (2.times{wink} 2.times{nudge}).

Scott Persinger said...

Tom,

The plugin is awesome! Here's one suggestion, prompted by another response on my original post about how a forked mongrel process will delete it's parent's PID file.

I've added the following little monkeypatch to the child block:

# Monkey-path Mongrel to not remove its pid file in the child

require 'mongrel'
Mongrel::Configurator.class_eval("def remove_pid_file; puts 'child no-op'; end")

--scott

Tom said...

Scott,

Thanks for your feedback.

The plugin calls exit! in the child process which doesn't call the at_exit handlers. And mongrel calls remove_pid_file in the at_exit block so it should be fine. I tested to make sure and the child process does not currently remove the pid file as expected.

Please do let me know if you see otherwise so I can try to reproduce your configuration and fix it if there's a bug.

~Tom

mikee said...

Seemed great.. But definitely doesn't work in PostgreSQL 8.3.

Any thoughts on where I can look to see what is causing PostgreSQL to drop the connection?

Thanks
Mike

mikee said...

Just doing some testing and found that if I comment out line 73 in spawn.rb

ActiveRecord::Base.remove_connection

That everything works fine but I'm not clear on what is happening in PostgreSQL. Would you expect database connections to pile up if you continued this way?

Thanks for any help.

mikee said...

Just a quick follow up in case anyone is interested but for some reason PostgreSQL doesn't like remove_connection being called on it so in the fork_it() method in spawn.rb i modified the yield to look like this:

begin
# run the block of code that takes so long
yield
ensure
#modified this to support PostgreSQL.
ActiveRecord::Base.connection.disconnect!
ActiveRecord::Base.remove_connection
end

caling disconnect! before remove_connection seems to alleviate the problem with PostgreSQL blowing up with the "connect closed unexpectedly" error.

Tom said...

Thanks for the info mikee. I will look into adding this (or something like it) when I find some spare time.

warrior said...

How about maximum number of spawns that one can make? Any code written to queue to the background tasks in case all of max spawns were in action?

This would be a good to have feature, if scale is a concern.

José Valim said...

Hi, I was just wondering, if I use spawn to send e-mail (just as You said) and the mailer is not requesting the database at all, do I have to worry with ActiveRecord connections? Couldn't I use Threads without worries?

Cause I was thinking in Monkey Pacth ActionMailer and set thread e-mails as a default behaviour. I will probably release as a plugin also! =)

What do You think?

And thanks! The spawn plugin are helping a lot to accomplish that!

ahabman said...

I tried using spawn with mod_rails without success. Any ideas?

This runs within the request response cycle:

spawn do
code
end

This does not run:

spawn(:method => :thread) do
code
end

StartBreakingFree.com said...

EXCELLENT plugin, clean and simple to use, did exactly what I want. Thanks!

One quick question. If an exception is raised inside the spawn, will I be able to catch it outside of the spawn or see it any other way?

For example, will the Exception Notifier plugin catch exceptions that are raised inside the spawn?

Tom said...

StartBreakingFree> you can catch exceptions inside of the spawn block but your parent process cannot catch exceptions because the exception occurs in a separate (child) process.

StartBreakingFree.com said...

Yep I discovered this the hard way today. Looks like i had some exceptions going un-caught and the exception notifier was't grabbing them.

Makes sense why now but i didn't think of it up front. Might be worth adding a note in the documentation?

Anyway, thanks for the plugin it has been useful!
Brian

Anonymous said...

Hi,
Can you please update the Spawn plugin to Rails 2.2?

Connection pooling breaks Spawn

Thanks

Anonymous said...

I switched to this:
http://github.com/imedo/background/tree/master

It's running fine so far under 2.2.2 for me.

Tom said...

I'll try to upgrade it for 2.2.2 when I have time... of course, contributions are welcome since it's open source.

Anonymous said...

The background plugin doesnt work for me either. No 2nd process gets created, and it reverts back to the processing in-process.

Steve Ehrenberg said...

Upgrading to 2.2.2 also broke spawning for me.. 2.1.2 works great still..

Surendra Singhi said...

I created a patch for it. Perhaps the patch can be changed to use conditional code and work with both rails 1.* and 2.2.*

http://rubyforge.org/tracker/index.php?func=detail&aid=23316&group_id=4646&atid=17940

garrytan said...

Surendra, your patch didn't work for me -- it doesn't seem like it would work because ActiveRecord still goes through the connection manager separately.

My patch actually resets the connection pools to avoid any shared connection issues. I've submitted a Pull request to Tom as well on github.

Works fine for me now with this fork:
http://github.com/garrytan/spawn/tree/master

Tom said...

I have merged garrytan's changes into github (will backfill on rubyforge next).

lsbb said...

Your plugin works great in production environmet. But in development environmet I have exception: ArgumentError: A copy of Line has been removed from the module tree but is still active!
Line is a model. I call in controller:

spawn do

Line.addall table

end

Windows Vista RoR:1.8.6/2.1.0. Any suggestions?