Can Drupal handle high-traffic sites?

In Lynda.com’s Drupal Essential Training video series, I said:

“…you shouldn’t use Drupal if it’s going to be an extremely high-traffic or mission-critical site. If it’s necessary to have this site online to save lives, you should probably use something else or at least have something else as a backup. If you’re going to run something that’s going to have millions and millions and millions of page views, probably Drupal is not the right solution — although I should mention, Drupal does run very popular sites.” –“Choosing Drupal” video, timecode approx. 5:00

This assertion was questioned by Stephanie Pakrul (stephthegeek) of TopNotchThemes in her review this past September. Now it’s being discussed again in this thread on Drupal.org — although I have to say that that particular thread is throwing off more heat than light. :-/

I originally made that statement based on past criticisms of PHP and MySQL… but now am coming to think that my understanding of these bottlenecks might be obsolete. Alexa says that theonion.com — probably the highest-traffic Drupal site at the moment — hovers around the 3,000 mark in terms of most trafficked sites. Pretty impressive! Sears.com, which is solidly in the top 1,000, has only about three times as much traffic. That’s not much of a difference.

So let’s open the discussion here: How scalable is Drupal?

(Error in penultimate paragraph corrected — see comments.)

28 replies on “Can Drupal handle high-traffic sites?”

Sears?
Since when is Sears running on Drupal? Their site doesn’t look like Drupal to me

D’oh!!!
I misread my own comment on TNT’s blog! That’ll teach me to skim. Thanks for catching it.

I’ve corrected the posting. Of course it’s out on Planet Drupal now, so we’ll see who else responds to the misinformation…. 🙁

Drupal runs WIkipedia’s donation system
Wikipedia is a top 10 website in terms of traffic. The donation system is regularly linked off of Wikipedia’s front page. Scaling is really about systems architecture and Drupal can be used by smart systems architects to run incredibly high trafficked and mission critical websites.

I doubt that page has traffic in the top 10,000
Hey, Zack. Do you mean the site at http://wikimediafoundation.org/wiki/Donate ? I doubt that it’s even in the top 100,000 trafficked pages on the Web — I’ve used Wikipedia hundreds of times but never seen it before. (I know, I know.) As always, I welcome corrections.

Yes
During donation drives (which they are running right now) the donation system serves up content across every page view for Wikipedia.

e.g. the top banner on:
http://en.wikipedia.org/wiki/Belgium

This system served by Drupal can scale to handle tends of thousands of requests per second. It’s a mission critical app for the 8th most trafficked website in the world.

Thanks!
Thanks for the clarification! Useful stuff to know.

Can you give any further insight? What involvement does Drupal have in serving this content?

Metrics = win.
O.K., I’m convinced.

I can think of potential holes in the logic — for example, differences between serving a banner and serving a page — but am not knowledgeable enough to make those arguments. If anyone else can, I hope they post here!

(Really, I’m not always this contentious… just want to stress-test your assertions before I create another controversy by repeating them. 😉 )

Sure..
It is a much much simpler use case to handle donations than to serve WordPress.com users, but it is still a very good example of a highly trafficked mission critical app. I would clear any assertions with David at Four Kitchens before you repeat them if you are at all worried.

Misinformation
I sense a quite large misinformation here.

1. I am not a native English speaker but from the linked article I read it as the 20k request/sec site is the wikipedia.org..

2. To show the same donation banner on every wikipedia page with making a request to a Drupal site sounds absolutely insane to me. I don’t know where did you get such idea but I would advise to forget it as quickly as you can..

I don’t think son..
“You also need a meter to show progress on a site that serves 20,000 requests per second.”

.. My understanding is that meter is served by a Drupal site.

Oh I understand what it means
In reality, pages generated for Wikipedia users by Drupal, Wikimedia, or whichever application they use are generally served through an extensive cacheing system (Squid etc.). So it’s not as if Drupal itself is literally serving up all those reqs per second…

Separation of “generation” and “serving”
He makes an important distinction, and one not always understood by people who aren’t sysadmins: Drupal is responsible for *generating* the pages, not serving them. It’s a distinction I didn’t have in mind when I made that statement in the Lynda.com course; I will from now on.

So perhaps the question should be, “Can Drupal generate pages for the world’s highest-traffic sites?” Barring good counterarguments, the answer appears to be “yes”.

I think you’re right
Yup, that one currently has a higher Alexa rating. Thanks for the tip!

(And hey, they have Rocky cartoons in Finnish!)

It’s my understanding–and
It’s my understanding–and I’ll admit to not being totally up to date on the heat and fire of this debate–that the issue at play isn’t so much that there’s a fatal flaw in drupal which prevents its scaling, but more that given 95% of use cases hardcore scalability isn’t the chief concern. Which, strikes me at least, as totally the right approach. I’d rather have a system that worked well in 95% of cases and was easy to develop for, rather than one that could theoretically handle traffic on an order of several thousand times what my (theoretical) site is handling.

I think the “huge sites present complex architectural problems” conclusion is probably pretty insightful, and certainly holds true for drupal sites.

I’ve been sort of running with the “content management framework” line of thought, and seeing if we can learn anything from thinking about Drupal’s “competitors” as Django/twisted and Ruby on Rails, rather than wordpress and Joomla. I don’t have a good answer to that in terms of scalability, but I’m intrigued by the question.

Not totally false either
I would agree that successfully serving a mission-critical website with really high traffic means having well managed server/cache architecture more than it means a perfectly tweaked application (drupal).

However, for the majority of websites out there, served from a single LAMP box, there is something to your original argument: Drupal caching NEEDS to be turned on, or as I learned earlier this year, the backlog of requests may eventually bring down your box.

I would also strongly argue that PHP needs an accelerator like APC installed, but that obviously goes beyond what Drupal can provide out of the box.

Yup, that beats theonion.com…
…with an Alexa rank at about 1,500. I hadn’t realized it was using Drupal. Thanks for the tip!

Alexa is not an accurate measure
Alexa is not a very accurate for traffic. It tries to measure “reach” and ranks it. But that does not reflect the actual page views/visits per day.

An example is a Drupal site we built that handles one million page views a day on a single server. Yet, Alexa says it has a rank of 16 million.

It is a site with an audience that is specialized, and that is perhaps why.

Alexa relies almost 100% on
Alexa relies almost 100% on data collected by their toolbar which is installed on a relatively small user base. You should use data from Compete.com or another similar service which tracks traffic at the ISP level instead. At least when comparing the data you’re looking at.

I think so far as this discussion goes, Drupal is great for any site that’s not getting dugg or slashdotted. When it comes to sites that carry that kind of traffic on a regular basis it comes down more to hardware and network management.

@Khalid – Which is the site with one million views/day?
I am creating a prestudy to find out if we should use Drupal on a high traffic, high profile site and I would really like to use your example as a benchmark. Could you please tell me the URL of the site you’re referring to? I can’t find it in the article.
Thanks!
/Linus

Comments are closed.