Node.js and Drupal

Elliott Foster | February 7, 2012 | JavaScript, Development, Drupal

Drupal is a great platform, but it can’t do everything. As your site grows, you’ll likely encounter use cases that Drupal can’t or shouldn’t do. Some examples include interacting with third-party APIs while preserving good page loads, or performing repeated actions (polling, etc) that result in site updates. Fortunately, separating these kinds of problems from Drupal and moving them to specialized “sub-stacks” within or near your Drupal stack is easy to do.

Enter Node.js. If you’re unfamiliar with it, Node.js is described as follows:

[…] a platform built on Chrome’s JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.

- From nodejs.org

The relevant part of this description that I’ll be focusing on is node’s “event-driven, non-blocking” model.

Third-party API communication

It’s becoming more common to integrate Drupal sites with third-party APIs. API integration is a great way of making your site more powerful without having to build a lot of functionality yourself. The problem is that a lot of third-party APIs are slow. A normal Drupal session runs in a single thread and blocks until all code finishes executing. If you are communicating with a third party during a user session, you can end up degrading user experience through slow page loads, confusing error messages, etc.

Handling this problem with Drupal queues is becoming more popular, and for good reason. Queueing API requests takes the transport out of the user session and shouldn’t affect user experience, but it also means your requests have to tolerate some delay (until the next time the queue is processed). We’re starting to see this more frequently on large Drupal sites and the trade off between user experience and immediate data propagation isn’t always an easy one to balance with clients.

So in this example, Node.js’s “event-driven, non-blocking” model means that you don’t need to wait for the API to respond before you can respond to Drupal. Node.js accomplishes this using callbacks, which if you’ve used jQuery before you should be fairly familiar with. The program will make a call and continue working, and when a response comes back, the callback will be executed. When written correctly, you can achieve nearly parallel execution of tasks — something that’s very difficult, and often impossible, to do in Drupal.

So how about an example of doing this? Consider a case where you want to update a third-party API with user information when the user’s profile is saved. Normally you would wait for the API to respond during the page request, but if you use Node.js as an API relay, the node server will immediately respond that it received the request, allowing the page load to finish quickly. The node server will then relay the request to the API. When a response is received it will call back Drupal with the results. Conceptually, it looks like this:

Moving the API communication out of Drupal allows you to speed up page requests and split off functionality that’s not well suited to a blocking model. There are many places you can typically do this with Drupal, not solely limited to API communication.

Repetitive jobs in Node.js

Another area that’s a good candidate for a Node.js-Drupal marriage is in repetitive jobs that need to be run very frequently. Again, Drupal queues can fulfill some of this need, but if you need to repeat the job very frequently, Drupal-based solutions are not usually ideal due to their high overhead (having to bootstrap Drupal every time you need to repeat the job). It would be better to bootstrap Drupal only when there’s work that actually needs to be done.

An example of this could be polling an external API for data that needs to be added to your site. Thanks to node’s event driven model in relatively few lines of code it can perform this action without generating a lot of overhead on the server while it’s waiting to repeat:

var http = require('http');
var options = {
  host: 'localhost',
  path: '/poll'
};
var interval = setInterval(
  function pollSomething() {
    http.get(options, function onGet(res) {
      res.on('end', function onEnd() {
        // Do something!
      });
    });
  },
  1000
);

This code will repeat a get request to http://localhost/poll every one second. When the response is complete you can add some logic to determine if Drupal needs to do any work, then make another call to Drupal with the payload from localhost. Any site-specific business logic still remains in Drupal, you’re just offloading the grunt work to an engine that’s more efficient at doing it.

Proof of concept

To see some real-world examples of the two use cases I’ve described here, check out the following code:

The key as a developer is to remember that Drupal is great at a lot of things — just not everything. If there’s another tool that’s better suited to the job you’re trying to do, use it! Just because Drupal is usually flexible enough for you to get it to do what you want doesn’t mean that you should bend it that way.

Commenting on this Blog post is closed.

Elliott Foster
Elliott Foster is a developer and standard nerd.

Comments

Great article, been looking into using Node.js with Drupal for a while.
Would you need to enable https://drupal.org/project/nodejs before drupal could ‘talk’ to node.js ?
Have you tried out the nodejs module yourself?
I’m going to try and learn a little more about node.js before I try and combine the two, but they do seem very complimentary. Let node.js handle the tasks that oftem make Drupal sites feel sluggish.

You don’t need the nodejs module to communicate with Drupal. I haven’t looked too closely at it, but depending on your use case you can normally get things working with a simple REST interface. If you need something more complex services integration is worth a look.

Fantastic post - I’ve always thought Node.js was cool technology but not really thought of a real world usage before. Using it as a staging point between Drupal and a third party API is an excellent example, as is using it as an alternative to Drupal Queues.

hi elliott,
thanks for the excellent write-up!
having to integrate a range of nodejs-services we started experimenting with dnode[1] and have a couple of sandboxes on d.o[2] to e.g. integrate a cometd server faye[3] for real-timish (authenticated/authorised) messaging and interesting use-case such as client-side-includes (to complement SSI/ESI). would love to hear what you think about that!

best, fredrik

[1] dnode: https://github.com/substack/dnode and dnode-php: https://github.com/bergie/dnode-php
[2] dnode: http://drupal.org/sandbox/frega/1321342 and dnode_faye: http://drupal.org/sandbox/frega/1357240
[3] faye: http://faye.jcoglan.com/

The dnode stuff looks really interesting and could be a good way of integrating asynchronous APIs or server/client communication more tightly with Drupal.

Chat clients are always a good example of this since their asynchronicity is immediately obvious.

Wonderful article. I’ve been working with the queue system in d7 quite a lot recently, and can definitely see how Drupal can benefit from offsetting some processing to node…. I would love in the future to see a tutorial of step by step how one can code that kind of model. Conceptually, the idea is wonderful, but doing something like calling node operations from PHP is very new to me :)

A couple questions (coming from my perspective having used some traditional queuing services but not node.js):

  1. The soapq_example obviously only works with the node.js implementation. Have you thought about ways to loosen that up? (e.g. provide an implementation of DrupalQueueInterface which sends jobs to node.js — then decouple soapq from the queue service)
  2. Is it correct to assume that the node.js-based background processing does not provide any guarantees — i.e. if node.js is offline or crashes, then all pending work is quietly lost. The downstream developer would be responsible for making the design robust.

The soapq module works by overriding the SOAP PHP class. You could theoretically make this extensible and send the jobs wherever you like. I’m not actually using this code on a site right now, it was mainly a proof of concept and something to hack on between sessions at DrupalCamp Austin :)

The level of guarantee will depend on the implementation. If the node server is using a persistent store then ideally it would recover uncompleted jobs in the event of a crash. The node server being unreachable wouldn’t be any different than a third party API being down so it would be up to the calling code to tolerate that kind of issue. If the specific use case can tolerate delay queueing the requests to the node process should be acceptable, but if it cannot an error message should probably be displayed, but again it would depend on the specific use case and what you were trying to do.