I spent some time Friday repairing a ridiculously bad PHP form that could (and may still be) fodder for four or five blog posts, primarily around security, but somewhere during the refactoring I had to examine an approach to generating a unique ID that was required to track something.
In general, there are three schools of thought to generating a unique ID:
1) Use a GUID
Globally Unique Identifiers. These typically look like this: 835af40d-e275-45fb-beb9-a98cdc0726bd. They were popularized (at least, when I started using them) with Microsoft’s COM platform as unique keys. The problem is, they’re not always unique (the PHP page on them has examples using a bunch of random numbers, which is pretty darned unique, but not guaranteed, and the page I pulled that example GUID from says “the generation algorithm is unique enough that if 1,000,000,000 GUIDs per second were generated for 1 year the probability of a duplicate would be only 50%.” So unique, except for when they aren’t. Which is a hell of an obscure edge case to chase. Plus they look ugly, especially if you want to put them in a URL.
2) Use a database key
MySQL has auto increment tables, so every time you add a row, there can be a field with a numeric key that the DB generates for you based on what else is in the table, your increment settings, and so on. The catch here is that it’s not hard to guess the next number in the sequence, so if your ID is something that will trigger a database pull, say on a multi-step application form, you want to think twice about the ramifications of someone being able to guess other object IDs.
3) Some made-up piece of crap
Typically this is based on timestamps, because it’s the easiest thing to work with, but it’s only unique for keys that aren’t generated at the exact same time. In my case, I pulled “magic” timestamp-based random number code from the form I was working with and threw it into a simple PHP script I could call from the command line. Just hitting up and enter (my shell’s setting for repeating the last command) at a decent pace I was able to get the same “unique” ID several times in a row.
The three keys to a unique ID
In my mind, a unique ID should be unique (duh,) hard to guess, and reasonably type-able. These factors are all on a sliding scale depending on your needs, and all get more expensive as you get closer to perfection, both on their own and in how they impact the other two factors.
For me, uniqueness is the biggest deal. I don’t want to be chasing duplicate key issues for those one in a billion cases that happen way too often (i.e. more than once) for my liking, and more importantly that you think might have happened and take away from identifying the real problem.
Hard to guess directly impacts easy to type. For simple applications, as long as there are more than a thousand possibilities per actual ID, I’m happy, and if I need more than that I’m liable to tie it into an actual authentication system. For public applications, you’re vulnerable to brute force attacks by bots, but that requires a different overall strategy anyway.
For easy to type, sometimes that’s because the ID shows up in the URL, and sometimes it’s simply to help with troubleshooting. I’ve made mistakes looking through logs where a 30+ character key was only one letter different than another and I didn’t realize it. So call it easy to read. Of course, if you make it too short, it’s easy to guess. One way to expand the number space is to use letters and numbers for numeric keys and encode your IDs into a more compressed (yet reversible) format, like how TinyURL does it.
Here’s what I did
My solution isn’t perfect, it’s not what I would do if I was building a form from scratch, but it solved my problem of making a “unique-r” key. As I mentioned, the code used a timestamp-based system to make an ID, but multiple hits at the same time would cause duplicates. All I did was append a unique sequence number to the end, through the magic of concatenating numbers as if they were strings.
This is a handy facility to have around, I’ve found: create a table in MySQL with only one column that’s an auto-increment primary key. Now, when you insert a row with ID=null and then query the last inserted ID, you get a number that’s pretty much guaranteed (subject to your DB architecture, but if it’s not you’ve got deeper problems) to be unique.
I took that ID, and appended it along with a dot to the original numeric key. The dot was important to differentiate it from an ID that happened to end with that sequence number, and it wasn’t being stored as an integer anyway so I could get away with it.
Oh, I also multiplied the result by a salt factor just to increase the working set a little, but that wasn’t really necessary for my purposes.
Again, your choices will vary based on your needs, but I’d suggest you ask yourself the worst case scenario for a duplicate or a correct guess, both from a customer impact and a developer productivity perspective.
Photo by fazen.