Simplifying SQL

Submitted by Larry on 22 October 2006 - 9:58pm

Most PHP applications do fundamentally the same thing: Shuffle data from an SQL database to a web page and back again. The details vary with the application, but in general that's what most web apps do. That very quickly runs into the bane of most PHP developers' lives: SQL syntax.

It's not SQL syntax itself that is bad per se. The problem is that it is a string-serialized format, which means you have to take your nice clean data structures and serialize them out into a string that has no semantic meaning to your PHP application. That's boring, dull, and introduces all sorts of places to totally mess up your application with a typo, and that's without even touching on issues of security. And then there are the issues with SQL syntax itself, in particular the way in which INSERT and UPDATE statements, which seem like they should be similar, have no similarity whatsoever. That makes "replace" operations (insert if new or update if not) very tedious to write, particularly if you have a lot of fields.

Fortunately, with a little ingenuity and help from PHP's array handling, we can give ourselves a common syntax for INSERT and UPDATE operations that maintains semantic meaning, and then get DELETE statements free of charge. Let's see how.

Background

First, let's make sure we all know the SQL syntax we're dealing with. We're not writing a full-on query builder for SELECT statements. Those can get quite involved, given how complex SELECT statements can get. INSERT, UPDATE, and DELETE, however, are very simple formats that are easy to automate.

INSERT
INSERT INTO tablename (field1, field2, field3) VALUES (a numeric value, 'A string value', 'another string')
UPDATE
UPDATE tablename SET field1=numeric value, field2='A string value', field3='another string' WHERE field3='Some other string' [AND|OR] field4<some number
DELETE
DELETE FROM tablename WHERE field3='Some \'other\' string' [AND|OR] field4=some number

See why they're so annoying to work with?

  • INSERT statements list all fields first, then their values, which could be far far away from them syntactically, while UPDATE statements are more logical key=value pairs. MySQL supports UPDATE-style syntax for INSERT statements but no other database does, making it not a good solution.
  • INSERT and UPDATE use commas to delineate lists while the WHERE clause uses AND or OR, depending on how you want multiple conditions to stack, and in fact can also have non-equality tests.
  • Unlike PHP lists cannot have a stray comma at the end. That makes building an SQL string the naive way much more complicated with a loop and $is_first or $is_last flag. Ick.
  • String values must be quoted with single quotes. Numeric values, however, must not have quotes. Again, MySQL is more forgiving here and allows all values to be quoted, but standard SQL does not.
  • String values must be properly escaped to avoid breaking the query string, leading to failed queries at best and SQL injection attacks at worst.

That can lead to utterly disgusting code like this:

<?php
mysql_query("UPDATE tablename SET field1=" . (int)$numvalue . ", field2='" . mysql_real_escape_string($string_value) . "', field3='" . mysql_real_escape_string($other_string) . "' WHERE field3='" mysql_real_escape_string($another_string) . "' AND field4&lt;" . (float)$othernum);
?>

And that's the main thing you do in a PHP application!

We can't really change any of the above issues, at least not without joining the board of ANSI and pushing through changes to the SQL spec and waiting about 20 years for vendors to implement the changes. We can, however, automate the fix.

PHP to the rescue

The solution is, in fact, rather simple. PHP is going to do most of the work for us, courtesy of its strongest feature: Associative Arrays. Not just the arrays themselves, but the utilities provided for futzing with them, in particular implode(). As we'll see, PHP gives us nearly all of the tools we need to give ourselves a fully generalized PHP API in just a few lines of code.

The API

Do how do we want to be able to use these tools? Well, any INSERT, UPDATE, or DELETE command is really just a key/value assignment statement. The WHERE portion of UPDATE or DELETE is also a key/value statement unless it's using non-equality comparisons. So we want to be able to work with key/value pairs, that is, associative arrays.

So the tools we're going to build will look like this:

insert($table, $fields)
Inserts a new record into $table, using the associative array $fields to determine which columns get set to which values.
update($table, $fields, $where, $conjunction)
Updates $table, setting fields to new values according to $fields where $where is true.
delete($table, $where, $conjunction)
Deletes records from $table where $where is true.

The important feature here is that $fields has the exact same syntax for both insert() and update(), and $where has the exact same syntax for both update() and delete(). That will allow us to build constructs like this:

<?php
$fields['field1'] = 3;
$fields['field2'] = 'some string';
$fields['field3'] = 'another string';
if ($id == 0) {
  insert('mytable', $fields);
}
else {
  $where['id'] = $id;
  update('mytable', $fields, $where);
}
?>

Now imagine we have 30 fields instead of 3 and you begin to see why this approach is so nice.

INSERT

Let's take the easiest one first. For the purposes of this tutorial we're going to use the PHP MySQL API, but in practice you'd want to use some form of database abstraction layer such as PEAR::DB, PEAR::MDB2, PDO, or something of your own creation.

For insert(). We have an associative array and want to convert that into two separate lists, one of the keys (for the fields) and one for the values. Each list needs to be comma-delimited, and the value list needs to, sometimes, be quoted. Fortunately PHP lets us do most of that almost trivially:

<?php
function insert($table, $fields) {
  $keys = array_keys($fields);
  $values = array_values($fields);

  $key_string = implode(',', $keys);
  $value_string = implode(',', $keys);
  query("INSERT INTO {$table} ({$key_string}) VALUES ({$value_string})");
}

function query($sql) {
  global $dbConnection;
  mysql_query($sql, $dbConnection);
}
?>

Wow, can it really be that simple? We're just breaking out the associative array, something PHP does for us. Then we're serializing each component of it, which PHP does for us. And finally we're dropping that into a template for a query string and executing it. (The query() function is just a little helper, and not really part of this exercise. More robust implementations would have a fancier query().)

Actually it's not quite that simple. We're not quoting any values, which means that string values will break the query. The first idea that comes to mind to fix that is to just quote each value:

<?php
function insert($table, $fields) {
  $keys = array_keys($fields);
  $values = array_values($fields);

  $values = array_map('quote', $values);

  $key_string = implode(',', $keys);
  $value_string = implode(',', $keys);
  query("INSERT INTO {$table} ({$key_string}) VALUES ({$value_string})");
}

function quote($string) {
  return "'$string'";
}
?>

That won't work, however, because we're then also quoting numeric values, something we can only get away with in MySQL. (Why didn't the SQL authors do the sane thing and just make all values quoted? The same reason they make INSERT and UPDATE completely incompatible, probably.) It also does nothing about escaping the actual string, leaving a nice big security hole.

OK, I hear you say, so let's only quote non-numeric values:

<?php
function quote($string) {
  return is_numeric($string) ? $string : "'" . mysql_real_escape_string($string) . "'";
}
?>

This would mostly work, but what about NULL, which is a string-like magic value that should not be quoted? Unfortunately, we're left with no alternative to checking with gettype():

<?php
function insert($table, $fields) {
  $keys = array();
  $params = array();

  foreach ($fields as $key => $value) {
    switch(gettype($value)) {
      case 'integer':
      case 'double':
        $escape = $value;
        $keys[] = $key;
        break;
      case 'string':
        $escape = "'" . mysql_real_escape_string($value) . "'";
        $keys[] = $key;
        break;
      case 'NULL':
        $escape = 'NULL';
        $keys[] = $key;
        break;
      default:
        continue;
    }
    $params[] = $escape;
  }

  $key_string = implode(',', $keys);
  $value_string = implode(',', $params);  // Note we use the processed version here
  query("INSERT INTO {$table} ({$key_string}) VALUES ({$value_string})");
}
?>

Now, with only a little bit of ickiness that we need do only once, we quote and escape string values, don't quote numeric values, and handle NULL cleanly. As an added bonus, we automatically filter out nonsensical values (eg, arrays). Unfortunately, there's one gotcha to keep in mind.

In PHP, '1' is a string. 1 is an int. If you pass in an array $fields['myint'] = '1', it will escape it (which does nothing) and quote it (which will cause trouble on stricter databases). That leaves the onus on the developer to type check values going into the function. Fortunately PHP type checking is easier than SQL type checking, and is something that the developer is doing anyway. You are, aren't you?

UPDATE

So if that was the easy one, what could the hard one be like? Well, since we already solved the escaping question, we can just reuse that technique. However, the SQL syntax is now completely different. The first idea is to simply use an associative implode(). Such utility functions have been written many times, and would build most of the query for us in a single statement. That runs into the same string vs. number question as with insert(), however, so let's not spend time on that. Instead, we'll have to again iterate manually.

<?php
function update($table, $fields, $where, $conjunction='AND') {
  $params = array();
 
  foreach ($fields as $key => $value) {
    switch(gettype($value)) {
      case 'integer':
      case 'double':
        $escape = $value;
        break;
      case 'string':
        $escape = "'" . mysql_real_escape_string($value) . "'";
        break;
      case 'NULL':
        $escape = 'NULL';
        break;
      default:
        continue;
    }
    $params[] = "{$key}={$escape}";
  }

  $value_string = implode(',', $params);
  $where_string = where($where, $conjunction);
  query("UPDATE {$table} SET $value_string {$where_string}");
}
?>

Very similar, but instead of building two separate arrays we build up a single array of key/value pairs (the $params line). Then we just implode() that and all of the commas go in the right place by magic. But what's that where() function? That's us thinking ahead. We're going to have to do the exact same logic for the WHERE clause of the delete() function in a moment, so let's factor it out right now to a separate utility function so that we're ready.

WHERE

So we need a WHERE builder now. WHERE clauses, like SELECT statements, can actually get rather complex. Fortunately in most cases an INSERT/UPDATE/DELETE statement doesn't use its full complexity, so we can make some short cuts. We need to handle equality comparitors as well as other types of comparators, and potentially nested compound comparitors. That is, something like:

<?php
... WHERE (field1='a' OR field2='b') AND (field3 < NOW())
?>

(NOW() being a MySQL function that returns the current timestamp.)

Let's handle the simple case of all equality comparitors first.

<?php
function where($where, $conjunction='AND') {
  $params = array();

  foreach ($where as $key => $value) {
    switch(gettype($value)) {
      case 'integer':
      case 'double':
        $escape = $value;
        break;
      case 'string':
        $escape = "'" . mysql_real_escape_string($value) . "'";
        break;
      case 'NULL':
        $escape = 'NULL';
        break;
      default:
        continue;
    }
    $params[] = "({$key}={$escape})";
  }

  $where_string = implode(" $conjunction ", $params);
  return 'WHERE ' . $where_string;
}
?>

Looks an awful lot like update(), except that instead of imploding with a comma we implode with our WHERE conjunction, either AND or OR. We then return the WHERE clause for the calling function to simply drop into a query string.

But what about non-equality comparitors? While we could introduce a more complex data structure to specify a comparitor, that makes basic tests, which make up the majority of cases, more difficult. We're trying to make the grunt work easier, not handle all cases perfectly. Instead, we take advantage of the fact that no SQL field has a numeric name and PHP auto-assigns a numeric key to a value when none is specified. (That's how numeric arrays work, actually. They're not actually different; they're just associative arrays that happen to have numeric keys.). So let's build up our WHERE clause like so:

<?php
$where['name'] = 'Bob';
$where[] = "createdTime < " . time();
?>

The second value will have a numeric key. That means we can do this:

<?php
function where($where, $conjunction='AND') {
  $params = array();

  foreach ($where as $key => $value) {
    if (is_numeric($key) {
      $params[] = $value;
    }
    else {
      switch(gettype($value)) {
        case 'integer':
        case 'double':
          $escape = $value;
          break;
        case 'string':
          $escape = "'" . mysql_real_escape_string($value) . "'";
          break;
        case 'NULL':
          $escape = 'NULL';
          break;
        default:
          continue;
      }
      $params[] = "({$key}={$escape})";
    }
  }

  $where_string = implode(" $conjunction ", $params);
  return 'WHERE ' . $where_string;
}
?>

As before, though, this does push escaping up a level. We now have to make sure we escape string values in a non-equality WHERE comparison before we put it into the array. We can't do it automatically because we have no way of differentiating the value out of the comparison that we're passing in.

There's one more trick to add here before we're done. Sometimes we don't want all of the tests to be ANDed or ORed. We want an AND of two ORs, or vice versa. How do we build that? Well, each piece of that test can be build by the where() function above, or would be if it didn't have that WHERE keyword in the return. So let's turn it off.

<?php
function where($where, $conjunction='AND', $where_keyword=TRUE) {
  $params = array();

  foreach ($where as $key => $value) {
    if (is_numeric($key)) {
      $params[] = $value;
    }
    else {
      switch(gettype($value)) {
        case 'integer':
        case 'double':
          $escape = $value;
          break;
        case 'string':
          $escape = "'" . mysql_real_escape_string($value) . "'";
          break;
        case 'NULL':
          $escape = 'NULL';
          break;
        default:
          continue;
      }
      $params[] = "({$key}={$escape})";
    }
  }

  $where_string = implode(" $conjunction ", $params);
  return ($where_keyword ? 'WHERE ' : '') . $where_string;
}
?>

Now with a simple boolean toggle we can build a partial WHERE clause, and then pass that partial clause back into the function again to build a full WHERE clause.

<?php
$where1['name'] = 'Bob';
$where1[] = 'age < 10';
$where3[] = where($where1, 'AND', FALSE);
$where2['name'] = 'George';
$where2[] = 'age > 10';
$where3[] = where($where2, 'AND', FALSE);
$where = where($where3, 'OR');
?>

Or in our case:

<?php
update($table, $fields, $where3);
?>

In fact, that works just as well for any WHERE clause, including those in SELECT statements.

DELETE

So do we get one that's easy yet? Please? Fortunately DELETE queries are just a WHERE clause. We've just build a WHERE clause builder, so this one becomes truly easy.

<?php
function delete($table, $where, $conjunction) {
  $where_string = where($where, $conjunction);
  query("DELETE FROM {$table} {$where_string}");
}
?>

What could be easier?

Conclusion

In practice, you'd probably want to give these functions somewhat less generic names. There are other improvements that could be added as well, depending on your database abstraction layer. update() could return the number of affected rows. insert() could handle creation of unique IDs with sequences (since auto_increment, arguably a nicer solution, is not database-agnostic) and return the just-inserted ID. We could even consider an insert_multiple() function that would run multiple insert queries, or even optimize for each database's multiple-insert functionality. (A good database abstraction layer like MDB2 does this already, so it would be easy to do.) I leave such enhancements as exercises for the reader.

Drupal developers needn't exercise, though. Alternate versions of these utilities (designed for Drupal's escaping and casting system) were added to the Drupal helpers module earlier this year. Thanks to Ber Kessels for integrating them after they were rejected as too query-builder-ish for core.

For everyone else, a copy of the final code is available for download under the GPL. Use it in good health. And if you have suggestions for improvement, please share!

Anonymous (not verified)

4 June 2008 - 5:31pm

I found a typo in the where function:

if (is_numeric($key) {

should be

if (is_numeric($key)) {

Took me all day to figure out why I was missing a } - well I wasn't - I was missing a )

:-)