Here’s a fun one: let’s suppose you need to take a user-supplied date from an input field and turn it into a date object:

var date = $('#date').val(); //assuming yyyy-mm-dd for the demo
var year = date.substr(0, 4);
var month = date.substr(5, 2);
var day = date.substr(8, 2);
var newDate = new Date(year, parseInt(month) - 1, day, 0, 0, 0);

Simple enough, right?

Not so fast

That code won’t work in older browsers, and when I say old, yes I mean IE8.

Some of the time, anyway.

Here’s the trick: older versions of parseInt() work differently if there’s a leading 0, and the function will assume you’re using a non-base-10 numbering system. If it’s just 0 and a number, it’ll assume octal (base 8,) and if it starts with ’0x’ it’ll assume it’s a hexadecimal number like 0xff.

The octal parsing has been deprecated in ECMAScript 5, but in the meantime, if you try to call parseInt with a number like, say, “08″, you’ll actually get 0 back, because octal only goes up to 7.

And that’s the kicker: Your date parsing code, if written as above, will work for 10 months of the year. Months 01-07 are valid octal numbers that happen to map to 1-7 in decimal. 08 and 09 are invalid and will result in 0, and 10, 11, and 12 don’t start with a 0 so they come back as you’d expect.

This means that depending on when you wrote the code, you might have time bombs lurking (and as I’m writing this in August, guess what I did this morning?)

The fix? Pass the second parameter to parseInt that specifies the radix (aka base): parseInt(“08″, 10) will work just fine in all browsers.

Mobile apps seem different than web apps to me, for some reason.  When I started making websites, I was working (more than) full time making call centre desktop applications, and I got into Perl as a “dip the big toe in” hobby.  Which eventually turned into some freelance work, which turned into too much freelance work, which turned into full time jobs, etc. But the translation from desktop to web paradigms felt pretty straightforward.

Mobile seems different to me, somehow.  Dipping the big toe in doesn’t yield the same productivity returns for me, at least when flipping between web and mobile.  A deeper plunge seems necessary, both in coding and studying what works.

From Warren Ellis‘ latest newsletter – it’s about writing comics, but this morning, I’m finding a lot of parallels in software:

You learn to write from reading books (and living your life). Next, you learn how to write comics by pulling them apart and studying their innards to see how they work. This is how you end up as a 24/7 comics writer and also a terrifying shut-in who will eventually go nuts in a very public way and conclude your career as a figure in a newspaper photo captioned FOREST CREATURE SUBDUED BY POLICE TASERS. But I’m serious. You are going to learn how to do this – learn your own way to manage the difference in pacing between eight pages and twenty-two pages and one hundred and twenty pages, learn how to achieve effects in timing and drama and emotional nuance, learn when to talk and when to shut up – by studying the best comics you can find, and tearing them apart and seeing how they do things and then stealing the tools you can use and adapting them into your own style. You are going to want to read broadly. Make yourself read things you wouldn’t ordinarily look at. If superheroes are your favourite, then make yourself read Carla Speed McNeil or Dan Clowes or Marjane Satrapi. If you only read science fiction comics, then force yourself to look at Hugo Pratt and Eddie Campbell and Svetlana Chmakova.

With the recent security breaches in some major websites (in this case, LinkedIn, but I feel pretty safe just going with “recent” and assuming there’ll have been one around the time you read this,) password security is getting a little bit more attention.

OK, I said “a little” – as one developer told me, “LinkedIn’s password leaked so I had to change my password 30 times,” to which I replied, correctly, “no, you changed the same password 30 times.” In the age of kick-ass, multi-platform password management apps, there’s really no reason not to use a different password on every website, and it doesn’t have to be one that’s cleverly based on the name of the site, like “gmail44secret.” I have no idea what my passwords are anymore, and I find that liberating.

And since I’ve got cut and paste on every platform I use my password manager on, there’s no reason not to use longer passwords, like, say, 30 characters.

OK, there is one reason (aside from the fact that on rare occasions I have to type something into a browser that I’m reading off of an iPhone) – not all sites support really long passwords.

Some will actively block you, saying, for instance, that the password has to be between 8 and 12 characters. But others, I’m finding, will just take your really long password and never work.

And frankly, that’s for the best, since it highlights some likely underlying problems. I can’t prove it, but I suspect that some of these sites are taking the password and storing it into a database field that’s been declared with too few characters.

In that case, the password is saved, but only the first n characters. The rest are truncated. Which means the subsequent user validation call won’t work.

It should be obvious that storing cleartext passwords is capital B Bad, but it’s ridiculously common. Even with some forms of encryption, the length of what’s stored is dependent on the length of the submitted password, so you’re still vulnerable to truncation.

Like I said, I can’t prove it, but I’ve found a few sites that will accept, but not honour, my 30 character passwords, and that’s the only theory I can come up with as to why. I wish it weren’t so, but we tend to think everyone does what we do, so if the dev/QA team uses 5 character passwords, it simply won’t occur to them to try really large ones.

Spot the bug: Rails dup vs clone

by Jason on June 18, 2012 · 0 comments

OK, the title gives this one away, and the most obvious bug is that there wasn’t a unit test that would have caught this, but here’s a variant of the Ruby on Rails production code, designed to copy invoices from one year to another:

invoice = Invoice.find(params[:invoice_id])
new_invoice = invoice.clone
new_invoice.edition = new_edition
new_invoice.summary = "Keep your ad the same"
new_invoice.amount_paid = 0
new_invoice.created_at =
skus = EditionPrice.for_edition(@edition)
invoice.items.each do |item|
  new_item =
  new_item.description = item.description
  new_item.sku = item.sku
  new_item.amount = item.amount
  new_invoice.items << new_item

This worked great in the Ruby on Rails 2.3.5 days, but after upgrading to 3.2 as part of a Heroku migration, the call would get killed by the 30 second timeout rule. At first I thought there were simply too many ads to clone (there are also image assets in the production code, which would take time to replicate in Amazon S3) but then I reloaded the referring page.

And saw 27,000 new line items in the first invoice.

So here’s the deal: in the Rails 3.1 release notes for ActiveRecord, there’s this little blurb:

  • ActiveRecord::Base#dup and ActiveRecord::Base#clone semantics have changed to closer match normal Ruby dup and clone semantics.
  • Calling ActiveRecord::Base#clone will result in a shallow copy of the record, including copying the frozen state. No callbacks will be called.
  • Calling ActiveRecord::Base#dup will duplicate the record, including calling after initialize hooks. Frozen state will not be copied, and all associations will be cleared. A duped record will return true for new_record?, have a nil id field, and is saveable.

And what will this mean, those of you who lack unit tests to automatically detect breaking changes like this one?  Well, the clone call will still copy the record, but including the id field.  Which means in the code above, we’re adding line items not to the new invoice, but the exact same invoice, since they have the same id.  And we’re iterating through that invoice’s line items.  And copying them.  To a list that keeps growing until the app is killed.

So to summarize: if you want to make a quick copy of an ActiveRecord that’s in addition to the one that’s already in the table, use dup, not clone, as of Ruby on Rails 3.1.  And add tests and read release notes…

When setting up a website that uses Amazon’s Simple Storage Solution (S3) for cloud storage, you’ve got a few access options available to you.  The first is to just use your account’s primary security codes, which could work if it’s the only thing you’re using the system for, but it’s a pretty big hammer. You’re going to encounter issues down the road if you need to change access for just one user or app, and it’s (not very) surprising how many “oh, just throw that in there” situations can arise where other processes get access out of convenience.  Setting just one password means you need to change it everywhere when you want to revoke access to just one piece of your puzzle.

Another option is to use Amazon’s Identity and Access Management (IAM) to set up a more granular solution that grants access to just the resources you need.  Here’s how I set up a client’s S3 bucket recently:

Log into the Amazon Management Console

There’s probably some cool CLI way to do all of this, but let’s face it: you’re working from a blog post that might be out of date, depending on when updates push out, so having things you can look at generally helps. Plus I’m not immune to typos.  Head over to the management console and log yourself in.

Go to IAM

I’m going to assume you’ve already created the bucket you’re wanting to use for access grants. If not, just go to the S3 service first and make one.  Now, the weird part about all of this is that the permissions on the S3 bucket have nothing to do with our plan.  If you go by them, you’ll have no idea who has access, which seems off to me, but anyway…

Create a group

From IAM you’ll want to create a new group, which leads you through a wizard:

The name of the group is up to you.  For my purposes, I’m granting a single account access to a single bucket, so I just use the bucket name wherever possible.

On the next step, you’ll want the custom group policy, because we’re going to paste it into the form on the next screen:

OK, again, policy name is up to you.  For the policy document, paste the following in and change the “mybucketname” to the name of your actual bucket:

  "Statement": [
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::mybucketname", "arn:aws:s3:::mybucketname/*"

And boom, you’ve got a group! Next we’ll need to create the user in the group:

Create a user

You can make as many users as you want; in my case, there’s just the one.  Note that I created the policy for the group, not the user, in case that matters to your future plans. The next step is important if you’re making an app that uploads files:

Get the security credentials

This is where you get the AWS ID and secret key that any API calls will need.  It only shows up once!  So I like to leave this open until I’ve saved the info somewhere safe, and preferably backed it up as well.


Of course, you’ll have to do something in your app now to use this newly empowered user, but you’ve now got granular access to a specific bucket in S3 that you can turn off at will without disabling anything else you’ve got in play.

I love using TortoiseSVN for easy source control on Windows, but on my Vista build (still RC) I’ve had a few instances of blue screen during commits, which is bad. In the last case, my repository got a little mangled, and I couldn’t do anything with it without the error “object of the same name is already scheduled for addition” appearing and aborting the operation.

The trick seems to be to do a revert operation on the directory, then an update. You may need to rename the original files/directories that are having a problem, and then diff/merge them back into whatever comes up, depending on what state the repository was in during the crash.

Then you might want to upgrade your OS and TortoiseSVN to the latest versions, and let me know how that works out for you – I see a day not far off where I’ll be like those people still running MacOS 9 when X has been around for years…

(Note: this post was originally written in 2007 on another (now defunct) blog I used to run. Reposting since there are some links to it floating around the web and it kills me to see “YOU DIDN’T HELP THIS PERSON” in my logs. So yes, I’ve upgraded from Vista RC since then.)

I’ve recently started moving some client projects from various Ruby on Rails hosts over to Heroku.  There’ve been a whole lot of lessons along the way, but the biggest has been just getting started: how do you move an existing MySQL database over to PostgreSQL?

(And you probably already know this if you’re reading this, but Heroku is a cloud computing platform that handles a lot of the server infrastructure for you. People use it for managed scalability but also so they can focus on the app and not the boxes that run it, but it uses a different database than what many Rails apps typically run.)

After some research, I decided to go with this basic approach:

  1. Import the production database into my development MySQL environment.
  2. Export the data.
  3. Convert the site to PostgreSQL.
  4. Import the data.
  5. Test.
  6. Export/Import the data to a new Heroku install.

…And that’s what happened.  More or less.  OK, somewhat less.

Most of the hiccups happened in getting the MySQL data into a format that Postgres could use.  For most data sets, it seems that the output produced by mysqldump isn’t going to work with Postgres.  There are a few search and replace tricks you can do, but for me, the stopper was when it came down to my use of boolean fields in Rails, which are ints in MySQL but not in pg (yes, I’ll just type ‘pg’ from this point on.)

(I ran the dump without schema info, and used rake db:schema:load to get data into the initial pg database, which is why the boolean type came up in the first place.  I can’t remember why I started with a schema, but there was a reason – possibly because I used boolean types. Anyway.)

There were a lot of boolean fields, across a lot of rows, so manually editing the dump file wasn’t an option.  Eventually I came across the AR_DBCopy gem by Michael Siebert, which is a quick script to copy data from one db to another, table by table, row by row.  There’s a great breakdown of how to use it over on Daniel’s blog that covers pretty much all you need to know.

Except for this: the gem didn’t work for me.  It’s possible that there were some changes in Rails since the gem was written (it’s three years old,) and I’m not convinced my fixes are “correct” but it’s the kind of tool that you need to have work just a few times, so once I got what I wanted out of it I was fine with that.  You can use my version by cloning and running rake install. So, the sequence, briefly, worked out to this (see Daniel’s post for expanded details)

  1. Import production data into development MySQL box.
  2. Change database.yml to use postgres, and install the ‘pg’ gem (note that the adapter name in database.yml is postgresql and the gem name is pg, you’ll get weird adapter errors from Rails if you mess that up.)
  3. rake db:create
  4. rake db:schema:load
  5. Edit config/database.yml to add a source and a target.
  6. ar_dbcopy config/database.yml
  7. Run select setval(‘TABLE_id_seq’, (select max(id) + 1 from TABLE)); on all tables with auto_increment keys (replacing TABLE with the table name, of course.)
  8. Test.

Oh, one other complication I hit at the db:schema:load phase – I had some indexes with identical names, which isn’t really a good idea, and pg complained.  I edited my db/schema.rb file to deal with this.  A more correct way would be to edit my migrations, but for this app I’m never planning on running a migration chain from scratch anyway.

The only other trick was actually getting this data into Heroku, which despite being well documented, wasn’t obvious to me.  Here are the commands you’ll need (pasted from the documentation):

PGPASSWORD=mypassword pg_dump -Fc --no-acl --no-owner -h myhost -U myuser mydb > mydb.dump
heroku pgbackups:restore DATABASE ''

Now, where it says “DATABASE”? That’s not a “put your database name here” prompt. You actually type the word DATABASE there.  Yeah, maybe that was obvious to everyone else, and I’m just special, but I thought I’d share in case anyone else thinks like me :)

And that’s pretty much all there is to it!  This app was pretty straightforward, but I’ve got a few others in the pipeline so I’ll update this if I run into any more hiccups, and of course leave your roadblocks (and solutions if you found ‘em) in the comments!

Spot the bug: autolinking URLs in text

by Jason on September 12, 2011 · 0 comments

Here’s some PHP code from last year that recently reared its ugly head.  Basically, it was designed to take a line of contact info from a business directory and automatically convert any URLs or email addresses to clickable links by identifying the patterns and wrapping them in the appropriate HTML:

function directory_linkify($contact) {

$rx_url = "#(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))#";

if (preg_match_all($rx_url, $contact, $matches) > 0) {

foreach ($matches[0] as $match) {

$contact = str_replace($match, "<a href = 'http://" . str_replace('http://', '', $match) . "' target = '_blank'>" . str_replace('http://', '', $match) . "</a>", $contact);



$rx_email = "#[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})#";

if (preg_match_all($rx_email, $contact, $matches) > 0) {

foreach ($matches[0] as $match) {

$contact = str_replace($match, "<a href = 'mailto:" . $match . "'>" . $match . "</a>", $contact);



return $contact;  }

It worked great until someone added two URLs to the same line, so given this:

Bob Smith, 1-800-888-8888,,

I was seeing this result:

Bob Smith, 1-800-888-8888, <a href=””></a>, <a href=””></a>/very_important_page.html

WTF? The second URL wasn’t catching the path.  But other tests with full path URLs were passing elsewhere.

At this point, and it doesn’t help that the bug appeared late at night when I should have been going to bed, the undisciplined whack-a-mole debugger in me came out in full force.  Maybe the order of the URLs mattered, maybe the regex I was using was faulty, maybe I needed to end that second URL with something different, maybe, maybe…

Of course, as with most bugs, it was pretty simple once I stepped back and actually looked at what was happening.  Let’s look at that foreach loop and assume the regex is finding URLs just fine, since it’s been working for a year already.

We’ve got two URLs in the $matches array: and So we loop.

In step one, we change to <a href = ‘’></a>.

And in step two, we change to <a href = ‘’></a>.

Except we don’t!  Step one already changed that text to <a href = ‘’></a>/very_important_page.html, which means the str_replace call in the second round of the loop never finds anything to replace.

This is a great case where you’ve got a simple process that happens to involve a complicated  distraction like a regex. When you’re debugging, it’s always a great idea to step through the function with a pencil and paper, and ideally with a second set of eyes, especially for relatively self-contained methods like this (which are the kinds you should strive to write anyway – hmm, maybe I should have split the email and URL detection out into separate methods…)

Anyway, in case you’re curious or need such a thing, here’s the replacement code that links properly. I doubt it’s perfect (I’m doing more PHP lately but this was from a while ago) so let me know if you spot any other bugs or opportunities for improvement!

function directory_linkify($contact) {

$rx_url = "#(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))#";

if (preg_match_all($rx_url, $contact, $matches) > 0) {

$index = 0;

foreach ($matches[0] as $match) {

$index = strpos($contact, $match, $index);

$contact = substr_replace($contact, "<a href = 'http://" . str_replace('http://', '', $match) . "' target = '_blank'>" . str_replace('http://', '', $match) . "</a>", $index, strlen($match));



// and then similar changes for email...

return $contact;


Beware of unused variables

by Jason on September 5, 2011 · 0 comments

When you’re making changes to a system, for the love of all that’s programmable, please be thorough.

If you’re changing a function, make sure you’ve cleaned up any variables that aren’t being used anymore (your IDE might be able to spot these for you, in which case there’s not a lot you can use for an excuse.)

If you’re changing a class and it’s internal framework stuff that won’t be used by anyone outside, make sure you remove functions if they’re not being called anymore.

If you’ve refactored global configuration variables or settings, make sure you’ve deleted all trace of them if they’re really gone.

If you don’t do these things, the next person to come along is going to totally miss her estimate on account of not realizing how many land mines need to be stepped around, and each one means more time added to the project, either in a big chunk right away by dealing with what you should have done already (and the testing and QA overhead that goes with it,) or in little increments every time anything around your old work needs to be touched (“what’s this? Oh, I don’t think it’s used, but I’m not sure…”)

Ask me know I know this…

And do all of these things right away.  Because if you wait, even a little bit, you’re going to find yourself in the same position as the new guy.

Ask me how I know that (in a smaller voice, at least in recent memory.)

The “I thought we might need to bring it back” defence doesn’t work in a source control environment. Every line of code you ever submitted is still there in the repository. So why not keep the really embarrassing stuff out of the head?

(And don’t get me started on how documentation lags the code… Yeah, it’s been a fun long weekend refactoring.)

I had a head slapper a while back where a Rails site I was working on wasn’t updating the created_at and updated_at fields properly in any of the tables.

(For those who don’t know, it’s a handy feature in ActiveRecord where these datetime columns get automatically refreshed on creates and saves – well, somewhat handy, until you over-rely on the updated_at field and then other business logic causes it to change when you don’t expect it, but I digress…)

Anyway, at the time I was working on a 3.0 beta rev, so I figured it was just something that was going to sort itself out, and I added some quick before_save and before_create handlers to the record to keep things in sync so I could work on the real problems (there were time and budget constraints and I was the only one who ever looked at those fields,) but I found it was still happening quite some time later on the official 3.0.x branch.

It turns out the problem was that these tables hadn’t been created through a standard Rails migration with a t.timestamps call; they’d been made up through a series of SQL scripts that had various explicit calls to create table.

And in there, the created_at field was defined as datetime not null default ’1900-01-01 00:00:00′.


So here’s the lesson: if created_at has a default, Rails/ActiveRecord will use the default, just like it does for any other column in the db.  You want these fields to be nullable without a default, so if this is happening to you, run a migration like this:

change_column :table_name, :created_at, :datetime, :null => true, :default => nil

Then you can get rid of embarrassing workarounds that shouldn’t have been there in the first place, especially since they now reflect a severe lack of understanding of how ActiveRecord works, at least compared to your newfound knowledge :)