Suppose you’re in a situation where you want to save the contents of an XmlDocument to disk (in C#, by the way.) And you end up with something like this:

        public static void SaveFileIncorrectly(string filename, XmlDocument xml)
        {
            using (var file = new FileStream(filename,
                FileMode.OpenOrCreate,
                FileAccess.Write,
                FileShare.None))
            {
                xml.Save(file);
            }
        }

Now the function name might have given something away, but odds are, this is going to work just fine.

At first.

The trouble comes when you want to re-write the document. Oh, it’ll write to the file just fine… some of the time.

Here’s the full demo, with spoilers:

using System;
using System.IO;
using System.Xml;

namespace XmlToFileError
{
    class Program
    {
        static void Main()
        {
            const string longerXml =
                @"<root>
                    <a><man><a><plan><a><canal><panama>
                    </panama></canal></a></plan></a></man></a>
                </root>";
            const string shorterXml =
                "<root><madam><im><adam></adam></im></madam></root>";
            var xml = new XmlDocument();

            xml.LoadXml(longerXml);
            SaveFileIncorrectly("output.xml", xml);

            Console.WriteLine("Longer file saved incorrectly, file contents:\n");
            OutputFileToConsole("output.xml");

            xml.LoadXml(shorterXml);
            SaveFileIncorrectly("output.xml", xml);

            Console.WriteLine("\nShorter file saved incorrectly, file contents:\n");
            OutputFileToConsole("output.xml");

            SaveFileCorrectly("output.xml", xml);
            Console.WriteLine("\nShorter file re-saved correctly, file contents:\n");
            OutputFileToConsole("output.xml");

            Console.WriteLine("Press any key to exit");
            Console.ReadKey();
        }

        public static void SaveFileIncorrectly(string filename, XmlDocument xml)
        {
            using (var file = new FileStream(filename,
                FileMode.OpenOrCreate,
                FileAccess.Write,
                FileShare.None))
            {
                xml.Save(file);
            }
        }

        public static void SaveFileCorrectly(string filename, XmlDocument xml)
        {
            using (var file = new FileStream(filename,
                FileMode.OpenOrCreate,
                FileAccess.Write,
                FileShare.None))
            {
                file.SetLength(0);
                xml.Save(file);
            }
        }

        public static void OutputFileToConsole(string filename)
        {
            using (var myFile = new StreamReader(filename))
            {
                var contents = myFile.ReadToEnd();
                Console.WriteLine(contents);
            }
        }
    }
}

…Which will output something like this:

Longer file saved incorrectly, file contents:

<root>
  <a>
    <man>
      <a>
        <plan>
          <a>
            <canal>
              <panama>
              </panama>
            </canal>
          </a>
        </plan>
      </a>
    </man>
  </a>
</root>

Shorter file saved incorrectly, file contents:

<root>
  <madam>
    <im>
      <adam>
      </adam>
    </im>
  </madam>
</root>
              <panama>
              </panama>
            </canal>
          </a>
        </plan>
      </a>
    </man>
  </a>
</root>

Shorter file re-saved correctly, file contents:

<root>
  <madam>
    <im>
      <adam>
      </adam>
    </im>
  </madam>
</root>
Press any key to exit

The difference, in case you didn’t spot the extra line in SaveFileCorrectly, is that we reset the FileStream’s length to 0 before we write, which truncates the file. If you don’t do that, and your document’s contents are smaller than the file you’re writing to, there’s going to be leftover crap at the end of the file, which something like XmlDocument.Load() is going to hate.

Yeah, yeah, ask me how I know this.

Anyway, important safety tip, kids: don’t cross the streams always reset your FileStreams before writing to them. Works the same for non-Xml content too, I’d reckon.

And as an aside: why use a FileStream in the first place instead of just xml.Save(“output.xml”)? In my scenario, I can’t remember the exact error that came up, but it had something to do with permissions, file locks, or some other thing that was preventing the save. Since I used the FileStream constructor all the way to the FileShare.None part, I’d guess that I wanted to make sure the file wouldn’t get read mid-write by another process.

Following the “why did I make longitude a string” debacle of 2014, adding a radius search to my Rails app was ridiculously simple.

The goal was a simple “what locations are within X metres of these coordinates” search.  In the past, I’ve done this with MySQL by basically putting the great circle calculation into the query, which isn’t pretty.

Using PostgreSQL, it’s a lot easier.  I’ll admit, I made the switch because I was deploying to Heroku, but I’m liking psql a lot and this is just one more reason.

First, you’ll need to add two extensions to your database:

CREATE EXTENSION cube;
CREATE EXTENSION earthdistance;

(FYI, I seem to recall seeing that the order matters here, but I didn’t take the time to try them the other way because… this one worked.)

Then, just to be nice, add an index (this is all assuming, as in my previous post, that you’ve got a locations table with float fields called longitude and latitude, change as needed for your app)

execute "CREATE INDEX geo_location on locations USING gist(ll_to_earth(latitude, longitude));"

Then you can add a scope for your model:

scope :within_radius, lambda {|latitude, longitude, metres| where("earth_box(ll_to_earth(?, ?), ?) @> ll_to_earth(latitude, longitude)", latitude, longitude, metres) }

And that’s basically it.  You can call Location.within_radius(43.6422125, -79.3744256, 5000) and you’ll have a list of locations within 5 km of downtown Toronto.

As an aside: I’m rather pleased with how I named that parameter metres.  I’ve run into this problem numerous times where the units aren’t clear, especially in time/duration functions (milliseconds? seconds? minutes?) so this way even I can figure it out (at a cost of having to decide between metres and meters, but using the American English version when the USA isn’t on the metric system anyway felt a bit off.)

Alternate title: how to lose that “you done goofed” feeling.

So let’s say you made an update to your table to support longitude and latitude, because you knew you’d need them someday but the push right now was just to add the fields so they could start getting populated.

And let’s go further and say it was item 4892 on your plate, and validating numbers is soooo hard, so you just made them strings:

  def change
    add_column :locations, :longitude, :string, :default => '', :null => false
    add_column :locations, :latitude, :string, :default => '', :null => false
  end

…And that’s great, until you actually want to do something with those, like, oh, run calculations on them, because calculations are done with numbers and you’ve got strings, so you can either push your string agenda forward with a lot of expensive casts, or you can admit there’s a problem and you’ve got to change it.

Unfortunately, a straight-up “change_column :locations, :longitude, :float, :default => 0, :null => false” isn’t going to cut it, at least in PostgreSQL (which we’re using precisely because we’re moving to geo stuff, more on that later.)  Those empty strings are going to fail on the conversion, because it’s not obvious that ” == 0, and you’ll get PG::Error: ERROR:  column “latitude” cannot be cast to type double precision

So you think it’s just a matter of changing the empty strings to zeros ahead of time, like “update locations set longitude=’0′ where longitude=””? Close, but nope.  PostgreSQL needs a hint, which means you’ve got to drop down to raw SQL:

execute("ALTER TABLE locations ALTER COLUMN latitude TYPE float USING (latitude::float)")

And that leads to “PG::Error: ERROR:  default for column “latitude” cannot be cast to type double precision” – it doesn’t look like you can alter a column type and default in the same statement, so you’ve got to break them up.

execute("ALTER TABLE web_listings ALTER COLUMN latitude drop default")
execute("ALTER TABLE locations ALTER COLUMN latitude TYPE float USING (latitude::float)")
execute("ALTER TABLE locations ALTER COLUMN latitude set default 0")

And that seems to work.  Now you’ve got floats in your database and can start doing geo math – more on that in the next post.

Here’s a fun one: let’s suppose you need to take a user-supplied date from an input field and turn it into a date object:

var date = $('#date').val(); //assuming yyyy-mm-dd for the demo
var year = date.substr(0, 4);
var month = date.substr(5, 2);
var day = date.substr(8, 2);
var newDate = new Date(year, parseInt(month) - 1, day, 0, 0, 0);

Simple enough, right?

Not so fast

That code won’t work in older browsers, and when I say old, yes I mean IE8.

Some of the time, anyway.

Here’s the trick: older versions of parseInt() work differently if there’s a leading 0, and the function will assume you’re using a non-base-10 numbering system. If it’s just 0 and a number, it’ll assume octal (base 8,) and if it starts with ’0x’ it’ll assume it’s a hexadecimal number like 0xff.

The octal parsing has been deprecated in ECMAScript 5, but in the meantime, if you try to call parseInt with a number like, say, “08″, you’ll actually get 0 back, because octal only goes up to 7.

And that’s the kicker: Your date parsing code, if written as above, will work for 10 months of the year. Months 01-07 are valid octal numbers that happen to map to 1-7 in decimal. 08 and 09 are invalid and will result in 0, and 10, 11, and 12 don’t start with a 0 so they come back as you’d expect.

This means that depending on when you wrote the code, you might have time bombs lurking (and as I’m writing this in August, guess what I did this morning?)

The fix? Pass the second parameter to parseInt that specifies the radix (aka base): parseInt(“08″, 10) will work just fine in all browsers.

Mobile apps seem different than web apps to me, for some reason.  When I started making websites, I was working (more than) full time making call centre desktop applications, and I got into Perl as a “dip the big toe in” hobby.  Which eventually turned into some freelance work, which turned into too much freelance work, which turned into full time jobs, etc. But the translation from desktop to web paradigms felt pretty straightforward.

Mobile seems different to me, somehow.  Dipping the big toe in doesn’t yield the same productivity returns for me, at least when flipping between web and mobile.  A deeper plunge seems necessary, both in coding and studying what works.

From Warren Ellis‘ latest newsletter – it’s about writing comics, but this morning, I’m finding a lot of parallels in software:

You learn to write from reading books (and living your life). Next, you learn how to write comics by pulling them apart and studying their innards to see how they work. This is how you end up as a 24/7 comics writer and also a terrifying shut-in who will eventually go nuts in a very public way and conclude your career as a figure in a newspaper photo captioned FOREST CREATURE SUBDUED BY POLICE TASERS. But I’m serious. You are going to learn how to do this – learn your own way to manage the difference in pacing between eight pages and twenty-two pages and one hundred and twenty pages, learn how to achieve effects in timing and drama and emotional nuance, learn when to talk and when to shut up – by studying the best comics you can find, and tearing them apart and seeing how they do things and then stealing the tools you can use and adapting them into your own style. You are going to want to read broadly. Make yourself read things you wouldn’t ordinarily look at. If superheroes are your favourite, then make yourself read Carla Speed McNeil or Dan Clowes or Marjane Satrapi. If you only read science fiction comics, then force yourself to look at Hugo Pratt and Eddie Campbell and Svetlana Chmakova.

With the recent security breaches in some major websites (in this case, LinkedIn, but I feel pretty safe just going with “recent” and assuming there’ll have been one around the time you read this,) password security is getting a little bit more attention.

OK, I said “a little” – as one developer told me, “LinkedIn’s password leaked so I had to change my password 30 times,” to which I replied, correctly, “no, you changed the same password 30 times.” In the age of kick-ass, multi-platform password management apps, there’s really no reason not to use a different password on every website, and it doesn’t have to be one that’s cleverly based on the name of the site, like “gmail44secret.” I have no idea what my passwords are anymore, and I find that liberating.

And since I’ve got cut and paste on every platform I use my password manager on, there’s no reason not to use longer passwords, like, say, 30 characters.

OK, there is one reason (aside from the fact that on rare occasions I have to type something into a browser that I’m reading off of an iPhone) – not all sites support really long passwords.

Some will actively block you, saying, for instance, that the password has to be between 8 and 12 characters. But others, I’m finding, will just take your really long password and never work.

And frankly, that’s for the best, since it highlights some likely underlying problems. I can’t prove it, but I suspect that some of these sites are taking the password and storing it into a database field that’s been declared with too few characters.

In that case, the password is saved, but only the first n characters. The rest are truncated. Which means the subsequent user validation call won’t work.

It should be obvious that storing cleartext passwords is capital B Bad, but it’s ridiculously common. Even with some forms of encryption, the length of what’s stored is dependent on the length of the submitted password, so you’re still vulnerable to truncation.

Like I said, I can’t prove it, but I’ve found a few sites that will accept, but not honour, my 30 character passwords, and that’s the only theory I can come up with as to why. I wish it weren’t so, but we tend to think everyone does what we do, so if the dev/QA team uses 5 character passwords, it simply won’t occur to them to try really large ones.

Spot the bug: Rails dup vs clone

by Jason on June 18, 2012 · 2 comments

OK, the title gives this one away, and the most obvious bug is that there wasn’t a unit test that would have caught this, but here’s a variant of the Ruby on Rails production code, designed to copy invoices from one year to another:

invoice = Invoice.find(params[:invoice_id])
new_invoice = invoice.clone
new_invoice.edition = new_edition
new_invoice.summary = "Keep your ad the same"
new_invoice.amount_paid = 0
new_invoice.created_at = Time.zone.now
skus = EditionPrice.for_edition(@edition)
invoice.items.each do |item|
  new_item = InvoiceItem.new
  new_item.description = item.description
  new_item.sku = item.sku
  new_item.amount = item.amount
  new_invoice.items &lt;&lt; new_item
end

new_invoice.save

This worked great in the Ruby on Rails 2.3.5 days, but after upgrading to 3.2 as part of a Heroku migration, the call would get killed by the 30 second timeout rule. At first I thought there were simply too many ads to clone (there are also image assets in the production code, which would take time to replicate in Amazon S3) but then I reloaded the referring page.

And saw 27,000 new line items in the first invoice.

So here’s the deal: in the Rails 3.1 release notes for ActiveRecord, there’s this little blurb:

  • ActiveRecord::Base#dup and ActiveRecord::Base#clone semantics have changed to closer match normal Ruby dup and clone semantics.
  • Calling ActiveRecord::Base#clone will result in a shallow copy of the record, including copying the frozen state. No callbacks will be called.
  • Calling ActiveRecord::Base#dup will duplicate the record, including calling after initialize hooks. Frozen state will not be copied, and all associations will be cleared. A duped record will return true for new_record?, have a nil id field, and is saveable.

And what will this mean, those of you who lack unit tests to automatically detect breaking changes like this one?  Well, the clone call will still copy the record, but including the id field.  Which means in the code above, we’re adding line items not to the new invoice, but the exact same invoice, since they have the same id.  And we’re iterating through that invoice’s line items.  And copying them.  To a list that keeps growing until the app is killed.

So to summarize: if you want to make a quick copy of an ActiveRecord that’s in addition to the one that’s already in the table, use dup, not clone, as of Ruby on Rails 3.1.  And add tests and read release notes…

When setting up a website that uses Amazon’s Simple Storage Solution (S3) for cloud storage, you’ve got a few access options available to you.  The first is to just use your account’s primary security codes, which could work if it’s the only thing you’re using the system for, but it’s a pretty big hammer. You’re going to encounter issues down the road if you need to change access for just one user or app, and it’s (not very) surprising how many “oh, just throw that in there” situations can arise where other processes get access out of convenience.  Setting just one password means you need to change it everywhere when you want to revoke access to just one piece of your puzzle.

Another option is to use Amazon’s Identity and Access Management (IAM) to set up a more granular solution that grants access to just the resources you need.  Here’s how I set up a client’s S3 bucket recently:

Log into the Amazon Management Console

There’s probably some cool CLI way to do all of this, but let’s face it: you’re working from a blog post that might be out of date, depending on when updates push out, so having things you can look at generally helps. Plus I’m not immune to typos.  Head over to the management console and log yourself in.

Go to IAM

I’m going to assume you’ve already created the bucket you’re wanting to use for access grants. If not, just go to the S3 service first and make one.  Now, the weird part about all of this is that the permissions on the S3 bucket have nothing to do with our plan.  If you go by them, you’ll have no idea who has access, which seems off to me, but anyway…

Create a group

From IAM you’ll want to create a new group, which leads you through a wizard:

The name of the group is up to you.  For my purposes, I’m granting a single account access to a single bucket, so I just use the bucket name wherever possible.

On the next step, you’ll want the custom group policy, because we’re going to paste it into the form on the next screen:

OK, again, policy name is up to you.  For the policy document, paste the following in and change the “mybucketname” to the name of your actual bucket:

{
  "Statement": [
    {
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::mybucketname", "arn:aws:s3:::mybucketname/*"
      ]
    }
  ]
}

And boom, you’ve got a group! Next we’ll need to create the user in the group:

Create a user

You can make as many users as you want; in my case, there’s just the one.  Note that I created the policy for the group, not the user, in case that matters to your future plans. The next step is important if you’re making an app that uploads files:

Get the security credentials

This is where you get the AWS ID and secret key that any API calls will need.  It only shows up once!  So I like to leave this open until I’ve saved the info somewhere safe, and preferably backed it up as well.

Done!

Of course, you’ll have to do something in your app now to use this newly empowered user, but you’ve now got granular access to a specific bucket in S3 that you can turn off at will without disabling anything else you’ve got in play.

I love using TortoiseSVN for easy source control on Windows, but on my Vista build (still RC) I’ve had a few instances of blue screen during commits, which is bad. In the last case, my repository got a little mangled, and I couldn’t do anything with it without the error “object of the same name is already scheduled for addition” appearing and aborting the operation.

The trick seems to be to do a revert operation on the directory, then an update. You may need to rename the original files/directories that are having a problem, and then diff/merge them back into whatever comes up, depending on what state the repository was in during the crash.

Then you might want to upgrade your OS and TortoiseSVN to the latest versions, and let me know how that works out for you – I see a day not far off where I’ll be like those people still running MacOS 9 when X has been around for years…

(Note: this post was originally written in 2007 on another (now defunct) blog I used to run. Reposting since there are some links to it floating around the web and it kills me to see “YOU DIDN’T HELP THIS PERSON” in my logs. So yes, I’ve upgraded from Vista RC since then.)

I’ve recently started moving some client projects from various Ruby on Rails hosts over to Heroku.  There’ve been a whole lot of lessons along the way, but the biggest has been just getting started: how do you move an existing MySQL database over to PostgreSQL?

(And you probably already know this if you’re reading this, but Heroku is a cloud computing platform that handles a lot of the server infrastructure for you. People use it for managed scalability but also so they can focus on the app and not the boxes that run it, but it uses a different database than what many Rails apps typically run.)

After some research, I decided to go with this basic approach:

  1. Import the production database into my development MySQL environment.
  2. Export the data.
  3. Convert the site to PostgreSQL.
  4. Import the data.
  5. Test.
  6. Export/Import the data to a new Heroku install.

…And that’s what happened.  More or less.  OK, somewhat less.

Most of the hiccups happened in getting the MySQL data into a format that Postgres could use.  For most data sets, it seems that the output produced by mysqldump isn’t going to work with Postgres.  There are a few search and replace tricks you can do, but for me, the stopper was when it came down to my use of boolean fields in Rails, which are ints in MySQL but not in pg (yes, I’ll just type ‘pg’ from this point on.)

(I ran the dump without schema info, and used rake db:schema:load to get data into the initial pg database, which is why the boolean type came up in the first place.  I can’t remember why I started with a schema, but there was a reason – possibly because I used boolean types. Anyway.)

There were a lot of boolean fields, across a lot of rows, so manually editing the dump file wasn’t an option.  Eventually I came across the AR_DBCopy gem by Michael Siebert, which is a quick script to copy data from one db to another, table by table, row by row.  There’s a great breakdown of how to use it over on Daniel’s blog that covers pretty much all you need to know.

Except for this: the gem didn’t work for me.  It’s possible that there were some changes in Rails since the gem was written (it’s three years old,) and I’m not convinced my fixes are “correct” but it’s the kind of tool that you need to have work just a few times, so once I got what I wanted out of it I was fine with that.  You can use my version by cloning https://github.com/jasondoucette/ar-dbcopy and running rake install. So, the sequence, briefly, worked out to this (see Daniel’s post for expanded details)

  1. Import production data into development MySQL box.
  2. Change database.yml to use postgres, and install the ‘pg’ gem (note that the adapter name in database.yml is postgresql and the gem name is pg, you’ll get weird adapter errors from Rails if you mess that up.)
  3. rake db:create
  4. rake db:schema:load
  5. Edit config/database.yml to add a source and a target.
  6. ar_dbcopy config/database.yml
  7. Run select setval(‘TABLE_id_seq’, (select max(id) + 1 from TABLE)); on all tables with auto_increment keys (replacing TABLE with the table name, of course.)
  8. Test.

Oh, one other complication I hit at the db:schema:load phase – I had some indexes with identical names, which isn’t really a good idea, and pg complained.  I edited my db/schema.rb file to deal with this.  A more correct way would be to edit my migrations, but for this app I’m never planning on running a migration chain from scratch anyway.

The only other trick was actually getting this data into Heroku, which despite being well documented, wasn’t obvious to me.  Here are the commands you’ll need (pasted from the documentation):

PGPASSWORD=mypassword pg_dump -Fc --no-acl --no-owner -h myhost -U myuser mydb > mydb.dump
heroku pgbackups:restore DATABASE 'http://s3.amazonaws.com/.....mydb.dump?authparameters'

Now, where it says “DATABASE”? That’s not a “put your database name here” prompt. You actually type the word DATABASE there.  Yeah, maybe that was obvious to everyone else, and I’m just special, but I thought I’d share in case anyone else thinks like me :)

And that’s pretty much all there is to it!  This app was pretty straightforward, but I’ve got a few others in the pipeline so I’ll update this if I run into any more hiccups, and of course leave your roadblocks (and solutions if you found ‘em) in the comments!