Richard JP Le Guen.ca

Skip to Content
All about Software Development on the WWW
RSS feed

Navigation

Tutoring Courses

Concurrency and Scalability on the WWW June 2nd, 2010

People like to make a lot of noise about the scalability of applications written in PHP or Ruby or other scripting languages. One of the major criticisms I’ve heard of PHP is that of how to deal with concurrency.

Take the following PHP script as an example:

password@hostFROM this_is_a_test WHERE id=1”);
if (DB::isError ($result)) {
        die (“SELECT failed: " . $result->getMessage () . "\n”);
}
if ($row =& $result->fetchRow ()) {
        $count = $row[1];
	echo “ROW WITH ID ‘1’ FOUND. COUNT IS: ".$count.”<br />";
}
else {
        die(“No such row!”);
}
	
$count++;
echo “UPDATING COUNT TO ".$count.”<br />";
// increment the field we’re modifying
$result = $conn->query (“UPDATE this_is_a_test SET count=?  WHERE id=1”, array($count));
if (DB::isError ($result)) {
        die (“UPDATE failed: " . $result->getMessage () . "\n”);
}
	
echo “DONE”;

Seems pretty straightforward… it might even seem stupid. So stupid that to the untrained eye, there is nothing wrong with this script. (other than inanity) There is something wrong, however, and the problem will only become visible when the system experiences a lot of traffic; the problem has to do with concurrency.

Contrary to the opinions expressed by some, web development – serious web development; the kind which scales up to people spending 500 billion minutes per month on your web site or making 34000 requests per minute – requires an understanding of concurrency, else your web application will be vulnerable to your most common concurrency problems.

To make the case, I’m going to subject the above script to a little concurrency, and the results should speak for themselves.

Note: The update statement I’m using should probably have a SET clause more along the lines of count=count+1 but that would circumvent the problem I’m going to demo, and using a more relevant script makes the demo harder to follow.

The Experiment: JMeter

What if we had 20 people visiting the page which runs the above code 20 times each. Therefore, when the experiment is over count should be 20 × 20=400… right? The sad reality is that it won’t be, because this script was written without consideration for the concurrent nature of the web.

Don’t believe me? You can run this experiment, using a tool like JMeter.

Step 1: Create a Thread Group

Open JMeter; in the left explorer-style toolpane you should see a ‘Test Plan’ node. Right click on it, and add a Thread Group. According to the JMeter User Manual

Thread group elements are the beginning points of any test plan. All controllers and samplers must be under a thread group.

Whatever that means, to run the experiment you need to create a Test Group.

Create a Thread Group

Step 2: Configure the Thread Group

You need the Thread Group to simulate 20 users visiting the page 20 times, so specify 20 threads and 20 loops per thread:

Configure the Thread Group

Step 3: Create an HTTP Request in The Thread Group

Now that you’ve configured how many users and how often they’ll do something, you need to specify what that something they do is. You can do this using a Controller – or, more specifically, a Sampler which sends an HTTP Request:

Samplers tell JMeter to send requests to a server. For example, add an HTTP Request Sampler if you want JMeter to send an HTTP request. You can also customize a request by adding one or more Configuration Elements to a Sampler.

Create an HTTP Request in The Thread Group

Step 4: Configure the HTTP Request

Once the HTTP Request sampler is created, you still need to specify where the request goes and by what method. Specify the Server Name, and path to the script – at the very least.

Configure the HTTP Request

Step 5: Run

From the ‘Run’ menu, select ‘Start’.

Run

Once the test has finished, poke around in your SQL database to check the count after these 100 visits. Here’s what I found:

SELECT id, count FROM this_is_a_test WHERE id=1
	
+----+------------------+
| id | count            |
+----+------------------+
| 1  | 299              |

The Result: Lost Updates

... and here we have a concurrency problem with our “simple” script. When you perform the experiment, you will almost certainly get different values, but the final analysis is this: it is very much unlikely that your count will be the expected 20 × 20=400.

So what happened?

This is an example of a common concurrency problem – the lost update. The idea is this: two requests are received by the server at the same time. As one request’s process reads (SELECT…); the other’s then reads (SELECT…) before the first writes. They now have both read the same value of count and increment it independently. Then the first writes to the DB with an UPDATE and finally the second does the same… overwriting whatever changes request #1 made to the DB.

Sequence diagram - lost update

This is (apparently) a much more frequent problem when programming PHP or Perl CGI, as each request is handled by its own process in its own context – every request gets a new instance of the interpreter – while in a Java Servlet, simply using the synchronized keyword circumvents this problem, as all requests are handled in the same context. The frequency of this sort of problem is one of the main reasons for which technologies such as PHP and Ruby are said to not scale well.

Is that true though? There is a solution; database locking, and it may not be as effective as language constructs like Java’s synchronized keyword, but the shoe fits nonetheless.

A Solution: Lock Tables

Using the LOCK TABLES command is meant – according to the MySQL Reference Manual – “explicitly for the purpose of cooperating with other sessions”. This means that LOCK TABLES can be used as a means for inter-process communication in environments which don’t handle all requests in the same context, such as PHP or Ruby CGI.

So the example script can be fixed using LOCK TABLES.

password@hostTABLES this_is_a_test WRITE”;
$result =& $conn->query ($stmt);
if (DB::isError ($result)) {
        die (“LOCK TABLES failed: " . $result->getMessage () . "\n”);
}
	
// read our row out of the database
$result = $conn->query (“SELECT id, count FROM this_is_a_test WHERE id=1”);
if (DB::isError ($result)) {
        die (“SELECT failed: " . $result->getMessage () . "\n”);
}
if ($row =& $result->fetchRow ()) {
        $count = $row[1];
	echo “ROW WITH ID ‘1’ FOUND. COUNT IS: ".$count.”<br />";
}
else {
        die(“No such row!”);
}
	
$count++;
echo “UPDATING COUNT TO ".$count.”<br />";
// increment the field we’re modifying
$result = $conn->query (“UPDATE this_is_a_test SET count=?  WHERE id=1”, array($count));
if (DB::isError ($result)) {
        die (“UPDATE failed: " . $result->getMessage () . "\n”);
}
	
$stmt = “UNLOCK TABLES”;
$result =& $conn->query ($stmt);
if (DB::isError ($result)) {
        die (“UNLOCK failed: " . $result->getMessage () . "\n”);
}
	
echo “DONE”;

The Solution

The real solution, however, is planning. How you deal with concurrency isn’t as important as simply dealing with it at all. If you don’t think about what the site will do when 2 people just happen to perform the same action at the same time – be it withdraw money, bid on an item or just increment the number of people who “like” something – your site won’t scale well and will suffer buggy behavior under high traffic. Don’t fool yourself, though; don’t think you can’t both plan and code to deal with these issues in any language – in some languages it’s a little easier, but it can always be done.


blog comments powered by Disqus
Content © 2008-2010 Richard Jean-Paul Le Guen