Database anxiety

by Rob Chant on February 7, 2009

If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!

An early design decision I made with Renaissance, when it moved from being a sloppy bunch of PHP files to a properly structured application (still in PHP) was to do with how the database is structured. I’ve pondered that decision a lot in the years since then, and have several times come close to going back on it and doing something entirely different.

Renaissance is structured around its content. Internally, items of content are instances of the Asset class, or classes which inherit from it. The nifty bit (I think) is that, internally, nearly everything is considered content—everything from obvious stuff such as text, image, links, et cetera, to users, templates, pages and even the content type definitions themselves (called components*).

So, Renaissance is centred around its content items, whatever they may be. This means that the database is centred around content—each content type has its own table. Fair enough. There are also a bunch of ancillary tables, such as the change log, keywords, errors, checkouts and a large bunch of other things.

This is background so far. The decision I made early on was in how the database is configured. And it’s this:

Each and every table in Renaissance can be in its own database.

In other words, there’s a JSON settings file that contains each of the databases to which Renaissance needs to connect. For each database listed, there’s a bunch of information, and a list of tables in that database (you can even alter this on the fly, moving tables from one database to another). The databases can be on completely different servers running completely different software—Renaissance doesn’t care (although I’ve only got round to writing MySQL support so far).

This is completely transparent as far as the coder is concerned. You simply make a database request on a table, and the database system looks up where that table is at the moment.

And the point of all this? Well, if you read this blog, you’ll know that I’m fairly obsessed with making Renaissance as flexible as possible, and, yet again, that’s the philosophy here. It allows you to set things up just how you want. So, for example, you could have your core, important content tables on PostgreSQL and all your logging tables on MySQL (notably, due to Renaissance on-disk caching, the logging tables might be the only ones actually accessed during a view of a fairly static page). You might simply divide things up between two servers for performance reasons or otherwise. Who knows?

But as you probably immediately realised, there’s one big drawback to all of this—all of SQL’s higher functionality, such as JOINs, VIEWs, TRIGGERs, et cetera, largely go out of the window because you can never guarantee that any two tables are even going to be on the same server, let alone the same database.

At the time, I didn’t really care about all that (sounds bad, I know, but it was years and years ago). I didn’t know all that much about what SQL can do, and besides, I thought, Renaissance is largely about simply loading content from the database—there’s very little fancy data manipulation, and what there is is better handled by the PHP anyway (as it’s so content type specific). Moreover, when you’re loading a row from a table, you nearly always want all the row.

The reason I’m thinking about this again is simply that Renaissance is getting slower, due to the fact it’s getting more complex and more robust (more thoughts on this in a different post). I’m shoring things up with a lot of caching, but something has got to give!

So, for example, simply loading a single asset hits at least the following tables:

  • the content’s own table (say, text)
  • the extended options table
  • the changelog table
  • the checkouts table (if we’re editing, not viewing)
  • the errors table
  • the hub table (stores flag and relational links to other content)
  • and possibly, depending on just what we’re doing:
    • publishing
    • locks
    • email alerts
    • keywords
    • notes
    • access schemes
    • et cetera

Anyway, you get the idea. Loading an item of content also requires that we load its content type, causing this cascade over again. Now, I’m sure you can see that a JOIN might be useful here! Needless to say, there are plenty of other cases where advanced SQL functionality would be useful.

So, there are options,

  1. leave everything as is and love my flexibility
  2. enforce that everything lives in a single database (you can always just use replication for performance issues anyway)
  3. do something in between, enforcing that a certain group of tables have to be in the same database, but allowing flexibility on certain others (e.g. still allow a split between content and logging)

All of those are easy to implement at a database system level. The real work would come in with actually taking advantage of what SQL had to offer. That would be pretty major structural work. I realise that the only things that are stopping me from changing things now (I kind of prefer option 3) is the fear of that work, plus an emotional attachment to how things are now.

What do you think? I could really do with some advice!

Update: Having spoken with various people and mulled stuff over, it’s got to be option 3 (well, maybe option 2…) I knew it would happen really!


* Yes, each content type (image, text, user, whatever) is represented by an asset, including content types themselves. It’s very circular. So, here’s how you create an asset:

var $thing=new Asset($id,$type);

/* or, if you know the type, text for example, and want to do something more specific */
var $thing=new Text($id);

So, you can go,

/* to load a text item */
var $my_text=new Text(‘my text’);    //titles are unique, so you can use them as IDs

/*to load a user*/
var $user=new User(‘whatever user’);

/*to load a content type definition, such as text or user*/
var $component=new Component(‘user’);

Nice, huh? Well, I like it, anyway.

No related posts.

{ 0 comments… add one now }

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>