Concurrency and Scalability on the WWW June 2nd, 2010
People like to make a lot of noise about the scalability of applications written in PHP or Ruby or other scripting languages. One of the major criticisms I’ve heard of PHP is that of how to deal with concurrency.
Take the following PHP script as an example:
password@hostFROM this_is_a_test WHERE id=1”);
if (DB::isError ($result)) {
die (“SELECT failed: " . $result->getMessage () . "\n”);
}
if ($row =& $result->fetchRow ()) {
$count = $row[1];
echo “ROW WITH ID ‘1’ FOUND. COUNT IS: ".$count.”<br />";
}
else {
die(“No such row!”);
}
$count++;
echo “UPDATING COUNT TO ".$count.”<br />";
// increment the field we’re modifying
$result = $conn->query (“UPDATE this_is_a_test SET count=? WHERE id=1”, array($count));
if (DB::isError ($result)) {
die (“UPDATE failed: " . $result->getMessage () . "\n”);
}
echo “DONE”;
Seems pretty straightforward… it might even seem stupid. So stupid that to the untrained eye, there is nothing wrong with this script. (other than inanity) There is something wrong, however, and the problem will only become visible when the system experiences a lot of traffic; the problem has to do with concurrency.
Contrary to the opinions expressed by some, web development – serious web development; the kind which scales up to people spending 500 billion minutes per month on your web site or making 34000 requests per minute – requires an understanding of concurrency, else your web application will be vulnerable to your most common concurrency problems.
To make the case, I’m going to subject the above script to a little concurrency, and the results should speak for themselves.
Note: The update statement I’m using should probably have a SET clause more along the lines of count=count+1 but that would circumvent the problem I’m going to demo, and using a more relevant script makes the demo harder to follow.
The Experiment: JMeter
What if we had 20 people visiting the page which runs the above code 20 times each. Therefore, when the experiment is over count should be 20 × 20=400… right? The sad reality is that it won’t be, because this script was written without consideration for the concurrent nature of the web.
Don’t believe me? You can run this experiment, using a tool like JMeter.
Step 1: Create a Thread Group
Open JMeter; in the left explorer-style toolpane you should see a ‘Test Plan’ node. Right click on it, and add a Thread Group. According to the JMeter User Manual…
Thread group elements are the beginning points of any test plan. All controllers and samplers must be under a thread group.
Whatever that means, to run the experiment you need to create a Test Group.

Step 2: Configure the Thread Group
You need the Thread Group to simulate 20 users visiting the page 20 times, so specify 20 threads and 20 loops per thread:

Step 3: Create an HTTP Request in The Thread Group
Now that you’ve configured how many users and how often they’ll do something, you need to specify what that something they do is. You can do this using a Controller – or, more specifically, a Sampler which sends an HTTP Request:
Samplers tell JMeter to send requests to a server. For example, add an HTTP Request Sampler if you want JMeter to send an HTTP request. You can also customize a request by adding one or more Configuration Elements to a Sampler.

Step 4: Configure the HTTP Request
Once the HTTP Request sampler is created, you still need to specify where the request goes and by what method. Specify the Server Name, and path to the script – at the very least.

Step 5: Run
From the ‘Run’ menu, select ‘Start’.

Once the test has finished, poke around in your SQL database to check the count after these 100 visits. Here’s what I found:
SELECT id, count FROM this_is_a_test WHERE id=1
+----+------------------+
| id | count |
+----+------------------+
| 1 | 299 |
The Result: Lost Updates
... and here we have a concurrency problem with our “simple” script. When you perform the experiment, you will almost certainly get different values, but the final analysis is this: it is very much unlikely that your count will be the expected 20 × 20=400.
So what happened?
This is an example of a common concurrency problem – the lost update. The idea is this: two requests are received by the server at the same time. As one request’s process reads (SELECT…); the other’s then reads (SELECT…) before the first writes. They now have both read the same value of count and increment it independently. Then the first writes to the DB with an UPDATE and finally the second does the same… overwriting whatever changes request #1 made to the DB.

This is (apparently) a much more frequent problem when programming PHP or Perl CGI, as each request is handled by its own process in its own context – every request gets a new instance of the interpreter – while in a Java Servlet, simply using the synchronized keyword circumvents this problem, as all requests are handled in the same context. The frequency of this sort of problem is one of the main reasons for which technologies such as PHP and Ruby are said to not scale well.
Is that true though? There is a solution; database locking, and it may not be as effective as language constructs like Java’s synchronized keyword, but the shoe fits nonetheless.
A Solution: Lock Tables
Using the LOCK TABLES command is meant – according to the MySQL Reference Manual – “explicitly for the purpose of cooperating with other sessions”. This means that LOCK TABLES can be used as a means for inter-process communication in environments which don’t handle all requests in the same context, such as PHP or Ruby CGI.
So the example script can be fixed using LOCK TABLES.
password@hostTABLES this_is_a_test WRITE”;
$result =& $conn->query ($stmt);
if (DB::isError ($result)) {
die (“LOCK TABLES failed: " . $result->getMessage () . "\n”);
}
// read our row out of the database
$result = $conn->query (“SELECT id, count FROM this_is_a_test WHERE id=1”);
if (DB::isError ($result)) {
die (“SELECT failed: " . $result->getMessage () . "\n”);
}
if ($row =& $result->fetchRow ()) {
$count = $row[1];
echo “ROW WITH ID ‘1’ FOUND. COUNT IS: ".$count.”<br />";
}
else {
die(“No such row!”);
}
$count++;
echo “UPDATING COUNT TO ".$count.”<br />";
// increment the field we’re modifying
$result = $conn->query (“UPDATE this_is_a_test SET count=? WHERE id=1”, array($count));
if (DB::isError ($result)) {
die (“UPDATE failed: " . $result->getMessage () . "\n”);
}
$stmt = “UNLOCK TABLES”;
$result =& $conn->query ($stmt);
if (DB::isError ($result)) {
die (“UNLOCK failed: " . $result->getMessage () . "\n”);
}
echo “DONE”;
The Solution
The real solution, however, is planning. How you deal with concurrency isn’t as important as simply dealing with it at all. If you don’t think about what the site will do when 2 people just happen to perform the same action at the same time – be it withdraw money, bid on an item or just increment the number of people who “like” something – your site won’t scale well and will suffer buggy behavior under high traffic. Don’t fool yourself, though; don’t think you can’t both plan and code to deal with these issues in any language – in some languages it’s a little easier, but it can always be done.
For YAFGC - htaccess Files May 20th, 2010
This one goes out to Rich and Hilary of Yet Another Fantasy Gamer Comic who are making some changes to their site. In so doing some of their URLs are gonna change, and like good, responsible web citizens they are setting up some redirects from the old URLs to the new ones.
To understand why this makes them awesome, you should read Hypertext Style: Cool URIs don’t change. by Tim ‘I invented the Internet’ Berners-Lee
An htaccess file is a granular Apache configuration file for just one (or a few) directories and files. They are often a pain in the ass to work with and understand… but they’re worth it, as they provide a mechanism by which to implement what most people call “clean urls” using mod_rewrite. I strongly encourage anyone reading this to look into clean urls; they’re good for SEO and make it easier for someone to navigate back to your page when they don’t have bookmarks available (and can’t find it on Google ‘cause your messy URLs result in poor SEO :P) and as such it’s no coincidence that most web frameworks offer them as a feature.
This article/post is for Rich and Hilary though, so I’ll cut to the point:
Apologies, Rich and Hilary, but I work as an educator and as such am irrationally compelled to explain this slowly and as such will – even why trying to get to the point – ramble on at lengths about the “why” every step of the way.
They want to use mod_rewrite to match part of the requested URL (in the query string) and then redirect to another page (with a different query string… on an entirely different domain!).
I am trying to remap this url:
yafgc.shipsinker.com/index.php?strip_id=abcd
to this url:
yafgc.net/?id=abcd
where abcd = a variable numeric parameter, from 1 – 4 digits.
Most people would try the following:
The following is a fail:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule \?strip_id\=(\d{1,4}) http://yafgc.net/?id=$1 [L,R]
</IfModule>
In baby steps, this rewrite rule says that, should the requested URL contain…
\? # a question mark “\?” (escaped with a ‘\’ as ‘?’ is a RegEx quantifier)
strip_id # followed by “strip_id”
\= # an equals character “=”
(
\d{1,4} # between 1 and 4 digits \d
) # store the digits – hence why they are in brackets
... then we rewrite the URL to http://yafgc.net/?id=$1, where $1 is a backreference to the digits which were in brackets. The “L” and “R” flags tell us this should be the Last rule checked and that the browser should be Redirected.
This won’t work because mod_rewrite doesn’t allow you to match the contents of the query string (the stuff after the question mark ‘?’) in a RewriteRule so the match perpetually fails. Instead, we have to match the query string in a RewriteCond.
The Solution
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{QUERY_STRING} ^strip_id=(\d{1,4})$
RewriteRule .* http://yafgc.net/?id=%1 [L,R]
</IfModule>
Here we match the QUERY_STRING and then use a different kind of backreference; unlike other RegEx implementations mod_rewrite provides two dimensions of backreferences; references to the RewriteRule and references to the RewriteCond. References to the rule are prefixed with a dollar sign ‘$’ like $1 in the earlier example, and references to the conditions are prefixed with a percentage sign ‘%’ like %1 in this example.
There is a demo available at http://richard.jp.leguen.ca/yafgc/index.php?strip_id=100
On Using The Web May 15th, 2010
So I’m a Web Dev TA (ok, was) who works with SharePoint. By day I’d work with <table> elements nested 15-20 levels deep with more <table> elements. By night I’d vigorously enforce semantic HTML and unobtrusive JS. I find myself operating at both ends of a very passionate holy war.
It’s given me some chances to reflect a bit on each side of the debate, and it’s brought me to the conclusion that the most important thing for a web developer to take away from the argument has nothing to do with tables or layout at all.
In The Classroom: “That’s Not Using The Web”
While TAing SOEN287 “Intro To Web Development” I had a student express discomfort at the idea that HTML should be semantic and that a particular layout – which he found would be incredibly easy with a <table> – would instead require more work and some challenging CSS. When I insisted that HTML was semantic information (at least in the academic environment) he noted that Tabless Web Design was taking the fun out of his web site. The best retort I came up with was that it’s not always fun, but I wish instead I’d told him this little nugget I found on Tim Berners-Lee’s FAQ page:
That’s not using the web.
Ok, I’m taking the quote a little out of context, but I like it anyways.
Form Tim Berners-Lee, the World Wide Web is an information medium – he doesn’t concern himself much with the visual design of web pages, and focuses on the information therein.
So why is it so hard to get developers to hear any of this, even students working in an academic environment? Is the ideal even relevant anymore? Did the pragmatists “win” or were they just right all along?
Outside The Classroom
I don’t know if they won, but the reality is that this idealistic crap often won’t put food on the table. Web Sites that we’re paid to develop are – for some time now – the spiffy looking tabular ones, which often include a Flash splash page.
The problem of clients who wanted seemingly tabular layouts is also compounded with (and probably feeds) a decade-old developer mentality which dictates that semantic HTML accompanied by CSS wasn’t worth the effort, or wasn’t the point of web development.
As proof of this mentality I’m going to bring up Stefano Borini. Stefano responded to a post I wrote a few months ago, and wrote me a wonderful email. Turns out he sems like a pretty cool guy, and he will now forever have that warm special place in my heart reserved for one’s first blog commenter… despite the fact that my site has no comments… and as such is not really a blog.
We exchanged a few emails about the class attribute, semantics and how I really hope I didn’t offend him, and in one of those emails he mentioned the following:
I read it, (the w3c recommendation) but I also tend to forget very easily, and I must be honest, this point wasn’t of my interest when I read it, so I didn’t care
(…)
So now the fact is that most people (I assume) believe what I thought, namely that a class attribute is “just a name to refer in the CSS”. until now, I chose my attributes quite freely, but my post of yesterday was a warning for me and others as well: classes do have semantic meaning, choose your names wisely.
I asked if I could quote him on it, as it raises some interesting points about where web development has been, and makes me wonder why? Whatever happened that web sites deviated so far from the ideal? What made developers just not care? Is it just that any kind of software development is doomed to be off the mark? Or did something a little more specific get in the way?
Where Web Development Was And Why It Caused CSS To (kind of) Fail
In 1998 when CSS2 was published as a recommendation, the web-scape was a different place from what it is today. A big part of this reality is the browsers that users were using; even 6 years later, in 2004, IE6 and other older versions of IE held 95% of market share; FireFox and Safari were new and unpopular. So if you were developing web sites, you were targetting IE6.
IE6, however, does not (and did not) fully or properly support CSS2, and has (or had) no shortage of shortcomings in any CSS2 it did support. Any web developer can tell you this.
My favorite IE bug is how sometimes absolutely positioned elements disappear when you :hover over an anchor. A solid runner up (still an issue in IE7… not IE8) was a CSS rule I wrote which caused – for no explicable reason – a JavaScript error. I’m surprised I ever found a solution (remove and redo the CSS rule) – and that feeling of surprise, which is associated with solving a problem in older versions of IE, is a huge part of what sculpted the web-scape in the early 2000s.
Learning to write proper, semantically-relevant HTML with presentational CSS was almost impossible in those days, and definitly not worth the invested time and effort. You could spend all the time you wanted learning about positioning schemes and the box model but the research and hard work both never would and never could pay off. It was not what Malcom Gladwell would call Meaningful Work:
Meaningful work is work that is autonomous. Work that is complex, that occupies your mind. And work where there is a relationship between effort and reward – for everything you put in, you get something out…
Creating web pages in the 90s, no matter what you put in, you always got the same thing out; irrational/unexpected behavior and buggy software. At the time, it really was the best idea to give up and use a table, because it was unlikely you would ever have a solid grasp of how the tools (HTML-CSS) interacted. It’s no wonder then that developers like Stefano Borini – who aren’t stupid people, nor bad developers – didn’t take any interest in how CSS worked or what the standard defined; it really was useless knowledge. Not caring was the smartest thing to do, with the highest ROI.
And so it was, for years and years. Until Mozilla FireFox came along and shook things up. In order to keep up with the changing demands and needs of web-users, Microsoft had to release a new version of Internet Explorer… and then another a few years later. These new browsers were more compliant – though they still had their shortcomings. The victory, however, was that now idealistic web dev had the potential to become Meaningful Work and developers could create semantic web pages.
This was only half the problem, however, as CSS still has limitations, and customers still want (need?) visually appealing web pages, so it doesn’t really make any difference… or is that changing too?
Where Web Development Is And Why MySpace Is Failing
There is massive potential and opportunity in steering web development towards the ideal, not only because web browsers are behaving themselves, but because regular every day people are beginning to think of the web as an information medium. This is why websites like Hear a Blog (coincidentally my new favorite way to cyberslack) are popping up, and I also believe this contributes to why RSS is booming right now but a similar technology, Channel Definition Format, faded away in the 90s. Back then people weren’t used to the idea of the web as an information medium; they were still hooked on visual concepts.
Now, people are ready to use and perceive the web as an information medium.
We can see the world is ready by observing the social networking sites of the 2000s; Facebook and MySpace. Facebook has remained a monster success while MySpace has faded. Why? Because the intent with Facebook was always to use the web for information. That’s why from the very beginning user’s couldn’t customize the look and feel of their profile. Meanwhile, MySpace is truly the GeoCities of the 21st century; its peronalized, poorly designed, inaccessible and often flat out ugly pages may have been the rage back “Where Web Development Was” but now the reality is that…
That’s not using the web.
And people know this now. They may not be able to say it or explain it when you ask, but deep down inside people know that Facebook succeeded and MySpace fails because of Facebook’s understanding of how to use the web and MySpace’s dated (obsolete?) outlook on the web.
Now would also be a good time to point out it’s no coincidence that Facebook has some sort of accessibility help center and MySpace doesn’t. I’m not saying that Facebook succeeds because they care for accessibility, but rather that there is correlation between their success and their concern for accessibility – both stem from the same understanding of how to use the Web.
What’s Your Point? – Where Web Development Is Going
My point is that web developers of all sorts (be they graphic designers, Asp.NET programmers, Rails developers or PHP scripters) need to stop thinking of the web as a visual medium. My point is not that you shouldn’t use tables – that horse is long dead – but rather that if you find yourself thinking I should just give up an use tables chances are you’re already thinking about your web app incorrectly, not really using the Web, and stifling your product’s potential. You’re also taking the risk that you’ll be left behind, stuck in the 90s as the web-scape continues to change.
Don’t confuse web development’s past with your career’s future.
I see where the web is as being identical to where movies were in 1927… or more specifically, talkies. The tools, technology and theory for sound film was available for some years but wasn’t perfected until 1927 and even once a ‘talkie’ had seen mainstream success, it still took some years for it to become the standard. The industry players didn’t think it would catch on; they thought it was silly and that theirs was a visual medium. Technological impedemends had made them lose sight of the reality that to the consumers, they were an entertainment medium – not a visual medium. Once the transition came, people’s careers were ruined.
Personally, I don’t want to stand around idly while a similar change happens in my field, commenting on how it’s a toy, it’s a scream, it’s vulgar, and doubting if they’ll ever really use it.
