I finally installed autotest for my Rails development – if you haven’t heard about it, it might just be the thing that gets you into TATFT mode: you edit a file, and the relevant tests automatically run.  It’s like continuous integration for your development machine, basically, and I’m amazed at how much time it saves by eliminating seemingly tiny operations like remembering to launch the test runner.

The thing was though, that I wanted a certain setup for my lib folder and accompanying tests.  Before things get to the plugin/gem phase, most of my library code that’s outside of the models and controllers usually sits in lib until I figure out what I want to do with it.

To test these things, I prefer to have a separate test folder instead of using test/unit – for one thing, I usually don’t have ActiveRecord involved in these modules, so I don’t need the fixtures to run.

Getting rake to handle the lib tests is pretty simple, thanks to this this tip from Stack Overflow:


namespace :test do

desc "Test lib source"

Rake::TestTask.new(:lib) do |t|

t.libs << "test"

t.pattern = 'test/lib/**/*_test.rb'

t.verbose = true

end

end

So great, now I can run ‘rake test:lib’ and I’m golden, but that doesn’t give me automatic goodness.

For autotest to do the same thing, you need to add some mappings to your .autotest file in the root of the project:

Autotest.add_hook :initialize do |at|
  %w{.git .svn .hg .DS_Store vendor tmp log doc}.each do |exception|
    at.add_exception(exception)
  end

  at.remove_mapping(/^lib\/.*\.rb$/)
  at.add_mapping(%r%^lib/(.*).rb%) do |filename, m|
    ["test/lib/#{m[1]}_test.rb"]
  end

  at.add_mapping(%r%^test/lib/.*\.rb$%) {|filename, _| filename}

end

The remove_mapping call is the key to making this all work: when autotest runs against a Rails app, mappings are already in place thanks to the autotest-rails gem.  That gem, however, has its own opinions for how lib files get run, and by default it’ll run tests in test/unit, which isn’t what I want.  If you don’t remove that mapping, your mapping won’t check against any .rb files in lib, and you’ll spend a few hours going insane.

Thanks go out to Brandon Keepers for some sample .autotest setups that I was able to work from!

Activating the youtube-g gem in Rails 3

by admin on August 15, 2010 · 2 comments

Gems with underscores always seem to mess with me and this one was no exception.  Here’s what goes into the Gemfile to avoid getting NameError/uninitialized constant YouTubeG ruining your day:

gem 'youtube-g', :git => "git://github.com/jasondoucette/youtube-g.git", :require => 'youtube_g'

Note the use of dashes and underscores – the :require directive is what makes this all work.

(Aside: I forked the code from another repository because I want to make some changes – the original documentation suggests that the Client.video_by method can retrieve video information by YouTube URL, but it’s actually the data URL they’re referring to, i.e. http://gdata.youtube.com/feeds/api/videos/UdJ0E7HbTKc and not http://www.youtube.com/watch?v=UdJ0E7HbTKc, which would be a lot more useful for a client app where they can paste in a URL to add the video to their CMS.  If I don’t end up spending another zillion hours fighting configurations, I might even do that.  Check github later…)

As I said, I’ve run into this kind of thing before with underscores and other differences between the gem name and the file name, which in the Rails 2.x world would be fixed with the :lib directive, but now we’ve got Bundler and Gemfiles.

Thanks go out to Andre Arko for his synopsis on Bundler that helped me see the differences.

Over the past 3 years or so, Google and, later, Firefox began taking a more active role in the fight against sites distributing malware by warning users before they allow passage to a site that they’ve detected as a possible threat.

I know this because… I had several of those sites.

Not on purpose, of course – it turns out my ad server (I use OpenX to manage banners and other stuff across multiple websites, which makes a lot of things easier) was hacked this week and someone inserted some nastiness into the ad delivery code for several banners.  The original ads still appeared, but other hidden stuff also got loaded that, presumably, posed a threat to some browser/operating system combinations.

The impact

So what happens when you get flagged?  Your search engine rankings don’t change, at least not right away, but there is a “this site may harm your computer” warning that appears under the page title in the search results, and you’re treated to an interstitial warning if you click through anyway.  If you’re using Firefox, it’s even more disruptive, with a big red box showing up before you’re able to proceed:

firefox attack site warning

See the little "ignore this warning" link in the corner? Yeah, neither did your visitors.

Of course, you’ve got a loyal fanbase for your site, and they’re not going to trust some computer warning over you, right?  Yeah… About that:

attack statsDiscovery

The affected sites aren’t commerce sites, and the ads don’t see enough traffic for me to obsess over daily stats reports, so in theory this could have gone on for a while before I found out about it.  Thankfully, I’ve got users and Google to help me there.  More on Google in a moment, but it’s really valuable to have a good relationship with your site’s recurring users and have a clear way for them to get in touch with you.  I received multiple emails about the problem, which was useful, since I don’t Google myself unless I’m doing SEO work (not this week) and I don’t use Firefox as often as I use Safari.

As I hinted earlier, you can actually get Google to tell you when this happens, but for that you need to tell them a bit about your sites first:

Google Webmaster Tools

Google has a really helpful tool for this kind of thing (and some other stuff too) called Google Webmaster Tools.  I don’t know where the link is for this, but rather than link to it I’ll just tell you how I always get to it: I Google it.  (Aside: a lot of Google’s services work this way, and it took me a long time to realize that this is probably on purpose, them being a search engine and all.)

You’ll need to register your sites with them, which means uploading a file to your server or setting a DNS record up (pretty much the same system as with Google Analytics,) and once you’ve done that you can see alerts about malware along with a bunch of interesting search engine metrics and crawler reports that deserve a separate post.

More importantly, you can set things up so Google emails you these reports automatically and you don’t have to log into the system every day to make sure you haven’t been hacked.  I hadn’t done this, so again, thanks to my users!

Fixing the holes

Before you try to restore your reputation, it’s vital that you fix the problem that got you into this mess, which is identifying the malware.  The report from Google is actually helpful in narrowing this down, specifying which pages have the attack code and which URLs they point to.  In my case, I was able to walk through the rendered page code and find the attack, which happened to be right next to my banner delivery code.

This is a whole other post, because it’s really interesting, but for now, let’s just say I was able to clean things up and plug the hole in the fence that let the attacker through, and then I looked at the page again to ensure that the attack code was no longer showing up.

Clearing your name

Google’s webmaster tools have a thing where you can request a malware review.  It’s basically a button you click, then you sign off that you’ve actually fixed something.

Google’s form claims it takes 24-48 hours to clear the warning, but in my case it was less than 12, which was pretty spiffy.

…And we’re clear – for now

Amazingly, this was the first time I’ve had to deal with this kind of thing myself (I’ve consulted on a few attacks for clients in the past.)  Was it avoidable? In hindsight, yes, but I’ve seen it happen to enough smart people to know that it’s not something I’m going to be terribly embarrassed about.  Security has a cost, and every site has its own resource budget.  For my personal sites, I’m going to spend some time defending my reputation, but I’m also going to spend more of my energy writing secure code for clients – I actually learn a lot from the forensics on these types of things, so while this was a hassle, it’s not like I walked away empty handed.

If nothing else, I’ve got a new justification for secure code when clients just want the quickest work possible: if your site has a security flaw, Google might block you.  Phrases like “SQL injection” might not mean much to the average business owner, but losing 80% or more of your daily traffic, and the accompanying revenue, certainly does.

It might be time to call it a night:

So I’m testing out a location class in Rails (version 3!), and I want to make some comparison methods in the model, so I can do things like if location.empty? or location.match?(elsewhere) and so on, and it’s not working for me, so I crank out the unit tests that, yes, I should have written in the first place, and then…

and then…

I spend the next 20 minutes trying to figure out why I see this in the console:

>> ooo = Location.new()
=> #<Location id: nil, company_id: -1, edition: "", cross_street: "", business_name: "", telephone: "", address: "", city: "", postal_code: "", country: "Canada", created_at: "1900-01-01 00:00:00", updated_at: "1900-01-01 00:00:00>

Why was the country getting pre-populated? I’m using the Rails 3 beta, so my mind jumps to all kinds of weird conclusions. Yes, including how did they know where I live?

Anyway, as many of you already know, the table was defined thusly:

+---------------+--------------+------+-----+---------------------+----------------+
| Field         | Type         | Null | Key | Default             | Extra          |
+---------------+--------------+------+-----+---------------------+----------------+
| id            | int(11)      | NO   | PRI | NULL                | auto_increment |
| company_id    | int(11)      | NO   | MUL | -1                  |                |
| edition       | varchar(255) | NO   |     |                     |                |
| cross_street  | varchar(255) | NO   |     |                     |                |
| business_name | varchar(255) | NO   |     |                     |                |
| telephone     | varchar(255) | NO   |     |                     |                |
| address       | varchar(255) | NO   |     |                     |                |
| city          | varchar(255) | NO   |     |                     |                |
| postal_code   | varchar(255) | NO   |     |                     |                |
| country       | varchar(255) | NO   |     | Canada              |                |
| created_at    | datetime     | NO   |     | 1900-01-01 00:00:00 |                |
| updated_at    | datetime     | NO   |     | 1900-01-01 00:00:00 |                |
+---------------+--------------+------+-----+---------------------+----------------+

…And yes, the ActiveRecord constructicon was simply taking the default value I’d provided.

(As is ever the case in these things, by the time I’d sorted out the test, things worked fine in my view.  I’ve no idea what changed in the process, but anyway, it’s probably the best endorsement for unit testing I can find…)

In praise of automated deployment

by admin on June 17, 2010 · 0 comments

I’ve worked jobs with websites that are manually sent to the server by FTP, and I’ve worked with sites that are pushed up with a single command line call.  Guess which ones I enjoyed more?

Let’s back up a minute.  Websites are, at their core, a collection of files on a web server (yes, often many servers. Work with me here.)  In my experience, there are a few common ways to make updates to a site:

  1. Editing files directly on the server. Here we’re connecting through ssh, telnet, Remote Desktop, tin cans and string, and what have you, and just editing the files live on the server.  If you make a typo the site crashes.  If there’s database work, it’s connecting to your live database, and somehow your odds of, say, forgetting a Where clause in an update or delete statement get astronomically higher in these cases.  Yeah, ask me how I know this…
  2. Editing files locally, and then replacing the server files with an FTP upload. In this case, you’re trying stuff out on a different computer. Ideally you’ve got a development environment set up so server-side code actually executes, and you’ve got a local database that’s either a clone of your production data or filled with test data (yes, that actually matters in some privacy circles,) so you can make mistakes to your heart’s content.  When you’re done, you upload the changed files.  Assuming you don’t forget any, of course…  This is the level where source control becomes a lot more possible, but there’s still a danger of regressing to step 1 from time to time and losing sync with the live code (I’ve worked in places where “grab it from the server before you edit” was a best practice, but we’ve got better options than that, see below…)
  3. Editing files locally, uploading a whole new version of the site, then switching over to that version. Here we gain a little protection against missing a file, combined with the bonus that you can get the whole site uploaded (for larger sites, this can take a while) and then a near-atomic action to flip the site over to the new version (renaming the root folder, repointing the web server’s root folder, etc.)  If you’ve got any folders that have special permissions (like making them writable) you’ll have to remember to change those too, but now we’re getting to a level where you can write the instructions down in a way that a reasonably trained infant could follow.
  4. Automated deployment. In this version, you’re taking what you do in option 3 and automating the whole thing down to a single command or button click or whatever.  The deployment is ideally based on what’s in your source control system so you know it’s a version you can recreate (and more importantly, roll back to) if needed, and any “magic steps” are baked into the scripts so there’s considerably less room for human error.

True confessions: did I say I’ve worked at jobs with each of these practices? A more accurate way to describe it would be that I instituted each of these practices. “It was the style at the time” doesn’t quite cover for my eternal shame.

For automated deployment, Capistrano remains my favourite tool of the moment.  I’ve written my own systems, but the thing of it is, most deployments are pretty much the same, and someone else has likely had that “weird situation” that you think you have to deal with, so a solution likely already exists.

Capistrano isn’t just for Ruby on Rails, by the way.  Here’s someone’s description of a PHP setup, and here are some tips for ASP.NET.

Basically, it boils down to this: as a client, would you rather pay someone an hourly rate to perform an arcane series of commands that go wrong 30% of the time, or have them type “cap deploy” and get on with the stuff that makes you money?  As a programmer, would you rather be on the hook for the downtime caused by a typo or two and endure the stress that comes with each deployment (typically making them further apart and more complicated) or, again, type “cap deploy” and get on with the stuff you can actually feel good about doing that got you into this line of work in the first place?

So the impetus/object lesson for today’s post: I just took on an “emergency rescue” gig to help a CakePHP project that had somewhere in the neighbourhood of 50 to 100K lines of code in it, spanning 50 models, 40 controllers, and so on, and it took about 5 hours to get something working on my development machine.  If it didn’t have a Capistrano script already in place and proven I would have had to either walk away from the job or spend another day or two learning the guts of the app so I could do an appropriate level of post-deployment testing.  There wasn’t any good way to tell if a glitch was just on my machine or if it was in the production version too (duplicating the error would have had noticeable impact on the site’s users in some cases.)

Thankfully, the build script was working and proven, so I could focus on making just the changes I needed to make and have reasonable confidence that they would be the only changes to the production system after they got pushed out.

For some programmers, a complicated deployment process might imbue a certain amount of job security, but I’m hoping that word’ll get around to clients everywhere that it doesn’t have to be this way.  There’s a small up-front cost with getting the process in place, but this pays for itself incredibly fast and lets everybody do the job they’re best at with minimal potential for damage to the website’s users.

(And if you’ve got a project that you’d like automated, I’m available for consultation…)

Decimal numbers in Rails and MySQL

by admin on May 22, 2010 · 2 comments

When you’re dealing with non-whole numbers in your application, there are a couple of considerations for storing them in the database.

The trouble with float

Most languages and databases have great support for floating point numbers, which from a naive perspective, would mean “numbers with a decimal point.” The trick of it is that these representations aren’t exact, because fractional numbers can go on for infinity, which is more than we can say for your computer’s storage. This results in some subtle inaccuracies that you may never see, but if you do, I guarantee they’ll pop up at the worst possible time, and if you’re using floats to store money amounts, there really isn’t a good time for a problem to appear…

(By the way, for a whole lot of detail on floating point numbers I recommend What Every Computer Scientist Should Know About Floating-Point Numbers.)

There are many programmers out there who, when assigning data types, will follow a basic “whole numbers are integers, fractions are either floats or doubles” approach (a double is a float with greater precision, but I’ll wager that many programmers will just use one or the other as a matter of consistency without any justification.)

In many languages, there’s another option, which is an abstraction for a decimal representation. This type usually has a smaller range of digits, but will be more accurate within that range, which makes it much more suitable for things like currency calculations.

In a Rails application backed to MySql, data columns are mapped to the :decimal type in migrations:

    create_table :example do |t|
      t.decimal :amount
    end

Within the Rails app (for parsing strings, for example,) this would map to the BigDecimal type, so you can do things like @example.amount = BigDecimal(string_to_parse).

Gotcha.

The missing piece to the puzzle is the precision of the decimal within the database. By default, the code above will create a decimal field in MySQL with 10 digits, all of which are to the left of the decimal point. In other words, saving 15.37 to your model will show 15.37 within the app, but when you save it, it’ll become 15 both in the data store and when you reload it.

The way around this is to specify the precision and the scale in your migration. Let’s fix that earlier code – you can do this right in the create_table logic in the first place, but odds are you’ll forget at some point and have to correct it with a followup migration:

    change_column :example, :amount, :decimal, :precision => 16, :scale => 2

In this case, we’ve set the total number of digits (the precision) to 16, with 2 decimal places (the scale.) That’s enough to render a big dollar amount, but you can pick your sizes as per your needs – the MySQL docs give some guidelines to the storage requirements for different combinations.

Also, when picking your precision and scale, be aware that the scale (max decimal places) can’t be greater than the precisions (total digits.) That might seem obvious when written here, but without those translations in parentheses, it’s easy to get lazy and make the two values the same, which would leave you with a number that has to be less than 1.0, since all the space will be allocated to the right of the decimal place. Yeah, ask me how I know that…

This is an interesting lesson in evolving standards, and evolving vigilance against bad implementations.  First, the core problem: the dreaded (hey, I dreaded seeing it this morning!) “does not seem to be a valid podcast URL” error in iTunes:

iTunes podcast errorIn this case, Developer Lives is in fact a dead feed (sadly,) but I fixed the problem with my client’s podcast before thinking to take a screen shot.  Web journalism falls lower on my priorities than fixing bugs, apparently…

The first thing I needed to do was shut off FeedBurner, which was relaying the source feed to iTunes and everywhere else.  This particular site was using some .htaccess directives with mod_rewrite to redirect all requests (other than FeedBurner) to the FeedBurner URL, but that feed is cached heavily and any exploratory changes to the feed structure weren’t going to be seen.

A quick commenting out of a few lines did the trick for that, but if you’re curious about how to use .htaccess so you can control your podcast feed, here’s a good tutorial. Since FeedBurner could have been the culprit (and I have spent a few hours this year troubleshooting some weird problems in that area,) I highly recommend having the means to keep the FeedBurner feed URL semi-private – it’s not secret, but if you always give out your local URL which redirects accordingly then you have the ability to take your feed back whenever you want it.

(Note that for my initial checks I was able to use curl with a “-A FeedBurner” argument to spoof the user agent, but that wasn’t going to work with the external validators I’d be using later.)

Once I had a feed that I’d be able to play with, I was able to take it to the feed validator, which is an essential service for this kind of thing.  Paste in your feed URL, hit the validate button, and it’ll fetch and scan your feed for errors – pretty much the same thing iTunes is doing, but with a whole lot more feedback than that error box we showed above!

(Of course, I could have, and one might argue should have, gone to the validator right away, but I had a hunch about how things were going to work out, and I got the alert before my coffee, so I figure I’m allowed a little fuzziness :) )

And lo and behold, there were a bunch of errors.  Nothing too nasty – a human could certainly read the XML and guess the feed’s intent – but for a computer that validates against a strict (apparently much stricter in a recent release) schema, it’s essentially garbage.

From there, it’s a matter of tweaking the output until validation passes.  Apple’s official iTunes RSS specs are really helpful here. In my client’s feed’s case, they were using an old Joomla plugin from 2007 or so that had already been tweaked a fair bit, but apparently the specification had gotten a little more formal in that time.  I actually don’t know if the previous feed had ever validated or if it was just a case of a developer throwing code at the problem until the shows appeared in iTunes, but there were a number of tags that were improperly formatted.  A half hour or so of messing with the PHP did the trick.

Once everything checked out, all that was left was to re-enable FeedBurner and ping their site to let them know the feed needed an immediate reload.

In the case of today’s bug, iTunes itself was rejecting the feed, but the latest episodes were still available in the iTunes store for individual download, which I thought was pretty interesting – iTunes’ servers are a little more permissive, apparently – at least for now.  Even if your feed doesn’t show the latest episodes in the iTunes store, validating the source is likely to solve 99% of your problems.