Monday, 27 May 2013
Facebook StumbleUpon Twitter Google+ Pin It

Improving Disk I/O in PHP Apps


Multiple Processes and Race Conditions

You probably remember that your computer switches different processes (running applications, …) in and out of the CPU. This is done so parallel execution of processes could be achieved on a single CPU core. This is what we call Multi Tasking - and the same principles apply to computers with more than one CPU core.
This means that your program (PHP script) is not executed consecutively. Some of the program is executed, then it's paused so something else can run, then it continues execution, then it's paused again, and so forth.
In some languages we can tell the executing host to treat a bunch of operations as a single operation - we call that atomicity. PHP doesn't know this concept. It's safe to say that your PHP script can be disrupted at any given time at any given operation.
Unless you've had the chance of working on fairly high-traffic sites, you've probably never seen a race conditions in action. A race condition is what we call the occasion of two (parallel executing) processes doing contradicting things. For exampleProcess 1 is writing the file /tmp/file.txt, while Process 2 is trying to delete that same file.
Those processes don't have to run in the same context. While Process 1 could be a PHP script, Process 2 could be a shell script triggered by Cron, or some manual rm /tmp/file.txt via SSH.

File Locking

To prevent these race conditions, we're allowed to lock files. When a file is locked by Process 1 and Process 2 is trying to acquire the lock, Process 2 is blocking execution until the lock was released by Process 1. PHP provides this functionality with flock().
flock() has a couple of problems, though. For one it only works with resources, so you need to have the file opened with fopen() prior to obtaining a lock. Also flock()will fail on certain file systems like FAT or NFS. On top of that it seems quite ridiculous to open a file, only to obtain a lock, only to delete the file.
So in real life, where a PHP script does not know which file system is used, flock()won't help.

Potential Race Condition

At first glance, the following code is considered to be good code, as we check if a file exist prior to unlinking it. That is because unlink() issues an E_WARNINGwhenever it can't find the file to unlink:
$filepath = "/tmp/some.file";
if (file_exists($filepath)) {
  unlink($filepath);
}
But we remember that PHP has no atomic operator and a script can be disrupted at any given time:
$filepath = "/tmp/some.file";
if (file_exists($filepath)) {
  // <- potential race condition
  unlink($filepath);
}
Considering the above code to be Process 1, we could encounter the following condition:
*Process 1*: file_exists("/tmp/some.file")
*Process 2*: unlink("/tmp/some.file")
*Process 1*: unlink("/tmp/some.file") -> E_WARNING, file not found!
Between checking if the file existed and actually removing it, another process had the chance to delete the file. Now the unlink() of our script issues an E_WARNINGbecause the unlink() failed.

Mitigating the Race Condition

Fear not, PHP knows the almighty @ silence-operator. Prefixing a function call with @makes PHP ignore any errors issued by that function call scope. The following code will prevent any E_WARNING issued due to a race condition (or any other fault, for that matter):
$filepath = "/tmp/some.file";
if (file_exists($filepath)) {
  @unlink($filepath);
}
With that little @ we've opened the door to a slight simplification of our code. Since we're performing the file_exists() to make sure unlink() won't issue any warnings, and @unlink() won't issue any warnings, we can simply dropfile_exists():
$filepath = "/tmp/some.file";
@unlink($filepath);
Et voila, we have successfully mitigated the race condition. And by doing so, we have accidentally reduced the Disk I/O by 50%.

Reducing Disk I/O (stats)

Besides the implications on race conditions, ditching file_exists() has the other benefit of reducing stat calls. Whenever you have to touch an HDD, imagine your Ferrari-application hitting the brakes. Compared to the CPU any hard disk (yes, even SSDs) are turtles chained to a rock. So the ultimate goal is to avoid touching the file system whenever possible.
Consider the following well coded program to identify if a file exists and when it's been modified last:
$filepath = "/tmp/some.file";
$file_exists = file_exists($filepath);
$file_mtime = null;
if ($file_exists) {
  $file_mtime = filemtime($filename);
}
Did you know, that filemtime() returns false (and issues an E_WARNING) if it can't find the file? So how about reversing things and ditching the file_exists():
$filepath = "/tmp/some.file";
$file_mtime = @file_mtime($filepath);
$file_exists = !!$file_mtime;

Custom Error Handling

As mentioned initially, ditching file_exists() was done to Smarty 3.1.0. We did numerous tests and benchmarks and came to the conclusion that we'd be stupid not to do it. And at that point I figured nobody would ever notice. That might've been true, hadn't it been for set_error_handler().
set_error_handler() allows you to register your own custom method for handling errors. It's pretty neat to push certain errors to a database or send mails or something like that. It gives you absolute power over each and every notice or warning issued. Even those that would've been masked by error_reporting() or the @operator.
Apparently some people register custom error handlers to get ALL THE ERRORS. Even the masked ones. Some developers failed to understand hints in the docs, others did it deliberately. Intentions aside, these ill-conceived error handlers break the way we expect PHP to work. All of a sudden errors like error in 'test.php' on line 2: unlink(/tmp/some.file): No such file or directory (2) started popping up.
In their minds Smarty was misbehaving. After all its code was raising E_WARNINGs all over the place. They didn't know (and didn't care) about the improvements we've made. They didn't want to "fix" their error handlers, as they did not see them broken. So in Smarty 3.1.2 I introduced Smarty::muteExpectedErrors() - a custom error handler that that would proxy their handlers, filtering out errors Smarty actually expected to happen.

No comments: