Monthly Archives: March 2011

General

Literature of programs

I wonder what programming will be like in a couple thousand years, when, presumably, programming languages and the systems built up from them will have attained a richly layered history of idioms, allusions, allegories, metaphors, genres, fads and past fads… I suppose it won’t take thousands of years, necessarily, since programming languages inherit a lot from human languages, and because software is mixed up in everything humans do and absorbs some of those flavors. But I also don’t think there’s really a literature of programs right now; at least, I haven’t been introduced to it yet, and I’d think I would have been. I’d love to have someone prove me wrong, though.

General

Software development and the study of human aspiration

Though the technical aspects are a source of joy for me in software development, the human aspects are no less fascinating. The practice of making software provides some unique insights into human psychology, because software is pretty much made of pure thought, and because software is intertwined in so many human activities. Software developers often get a very detailed look at what people do or where they expect to go in their lives through the ongoing development of a project or their requirements for a new project. That’s pretty cool, when you think about it.

Maybe some day I’ll be able to add to my resume that I have great expertise in understanding and eliciting people’s aspirations.

General

Heuristics and bestiaries

I’ve worked on more than what I consider my fair share of projects where I had to deal with crazy input. What I mean by crazy input is: it fails to meet the basic criteria of the standard to which it’s supposed to conform, or there is not a standard to which it’s supposed to conform, or standard elements are mashed together in unanticipated ways to try to achieve non-standard effects, or some combination of those.

I suppose that if you dig deep, you’ll find that most of the data in the world is crazy in this sense. So despite my distaste for it, I’ll probably never get away from it. Still, I need to whine about it occasionally. But in addition to whining about it, I’ll present a little practical advice on dealing with it.

The first and most obvious reality in dealing with crazy input is that some heuristic methods will have to be used. For example, in some HTML input, I had to deal with this little number, which, in some documents, occurred between every two paragraphs (well, divs, because these documents don’t use p tags much at all):

<div style="margin-top: 6pt; font-size: 1pt">&nbsp;</div>

These aren’t really paragraphs. Even if you call a div a paragraph, they’re still not paragraphs; they’re just there for spacing. So I have a rule that says something like “If a paragraph is preceded by a paragraph that is essentially empty and has a font size less than 5, remove the empty paragraph and change the spacing on the current one to get the same spacing effect”.

That’s not the best/worst example, but it gets the point across. Heuristics will be necessary. That implies two things: the overall code structure must be adaptable to the addition of heuristics, and I’ll need a bestiary to test the code.

It’s part of the software developer’s mindset to try to neatly partition the entire universe into non-overlapping subsets, then write chunks of code to deal with each partition separately. The introduction of heuristics into such a beautiful scheme will cause some pain. In the beautiful world, I’d have code that says “it’s a paragraph, let’s do the paragraph thing with it”. In a world laden with heuristics, I have code that says “it’s a paragraph, but let’s see if it’s _really_ a paragraph, then we’ll either do the paragraph thing or do some wildly different thing”. I guess it’s less that the code structure has to be adaptable than it’s that my mindset has to be adaptable.

Regardless of the flexibility of my mindset or my code, though, heuristics, by their nature, do not neatly partition the universe. They leave some things out, they overlap, and/or they tangle together in increasingly strange ways. I’ll never remember, when it comes time to add some new code, all the situations that got me to this point or all the ways that things can go wrong.

I’m not yet a full convert to test-driven development, but when dealing with beastly input, I consider a bestiary to be quite necessary. A big set of unit tests, with a perfect specimen of each of the beasts I’ve encountered, each named after the ticket in the ticket-tracking system that brought it to me. I had one project in the past where I should have created a bestiary but didn’t, and that project was one of the worst disasters in my professional life. Another one like that and I would have traded my keyboard in for a shovel and started a new career…