Category Archives: General

Catch-all category


Diffing SWFs

In case I’m not the only one trying to do this, I thought I’d write this up.

Given: a build that produces many SWFs, a source tree with lots of dependencies, not all of which are obvious, a revision control system from which to get a history of your source. You wish to write a script that correctly predicts which SWFs will change between two revisions of the source. You want to test that script by actually comparing the binaries to see if the predictions were correct.

The first thing you’ll note is that doing a binary compare of the SWFs, or a text compare of the swfdumps, does not answer the question. There’s a timestamp, and if debugging is on, there’s a debugger password. But much worse, internal ID numbers and ordering of classes and chunks of code changes semi-randomly. Do two builds of precisely the same source, and unless you write a lot of fancy code, you can’t tell whether the SWFs are really the same.

So, go grab the Flex SDK source and modify it with the following two bits of sed:
sed -i ‘s/bHashSet/LinkedHashSet/g’ `find -name *.java`
sed -i ‘s/bHashMap/LinkedHashMap/g’ `find -name *.java`
In other words, replace the Java collections which have non-deterministic iterators with equivalents that have deterministic iterators.

Your new compiler produces pretty deterministic output. You still have to deal with the timestamp and debug password. One way to do that is to [bash]swfdump -abc[/bash] the binaries and compare the dumps textually, ignoring those few lines that always change. Another approach is to modify to remove the code conditional on configuration.generateDebugTags(), and set the last argument in the ProductInfo constructor (the timestamp) to 0. Then the binaries should be bitwise identical.


Literature of programs

I wonder what programming will be like in a couple thousand years, when, presumably, programming languages and the systems built up from them will have attained a richly layered history of idioms, allusions, allegories, metaphors, genres, fads and past fads… I suppose it won’t take thousands of years, necessarily, since programming languages inherit a lot from human languages, and because software is mixed up in everything humans do and absorbs some of those flavors. But I also don’t think there’s really a literature of programs right now; at least, I haven’t been introduced to it yet, and I’d think I would have been. I’d love to have someone prove me wrong, though.


Software development and the study of human aspiration

Though the technical aspects are a source of joy for me in software development, the human aspects are no less fascinating. The practice of making software provides some unique insights into human psychology, because software is pretty much made of pure thought, and because software is intertwined in so many human activities. Software developers often get a very detailed look at what people do or where they expect to go in their lives through the ongoing development of a project or their requirements for a new project. That’s pretty cool, when you think about it.

Maybe some day I’ll be able to add to my resume that I have great expertise in understanding and eliciting people’s aspirations.


Heuristics and bestiaries

I’ve worked on more than what I consider my fair share of projects where I had to deal with crazy input. What I mean by crazy input is: it fails to meet the basic criteria of the standard to which it’s supposed to conform, or there is not a standard to which it’s supposed to conform, or standard elements are mashed together in unanticipated ways to try to achieve non-standard effects, or some combination of those.

I suppose that if you dig deep, you’ll find that most of the data in the world is crazy in this sense. So despite my distaste for it, I’ll probably never get away from it. Still, I need to whine about it occasionally. But in addition to whining about it, I’ll present a little practical advice on dealing with it.

The first and most obvious reality in dealing with crazy input is that some heuristic methods will have to be used. For example, in some HTML input, I had to deal with this little number, which, in some documents, occurred between every two paragraphs (well, divs, because these documents don’t use p tags much at all):
<div style="margin-top: 6pt; font-size: 1pt">&nbsp;</div>

These aren’t really paragraphs. Even if you call a div a paragraph, they’re still not paragraphs; they’re just there for spacing. So I have a rule that says something like “If a paragraph is preceded by a paragraph that is essentially empty and has a font size less than 5, remove the empty paragraph and change the spacing on the current one to get the same spacing effect”.

That’s not the best/worst example, but it gets the point across. Heuristics will be necessary. That implies two things: the overall code structure must be adaptable to the addition of heuristics, and I’ll need a bestiary to test the code.

It’s part of the software developer’s mindset to try to neatly partition the entire universe into non-overlapping subsets, then write chunks of code to deal with each partition separately. The introduction of heuristics into such a beautiful scheme will cause some pain. In the beautiful world, I’d have code that says “it’s a paragraph, let’s do the paragraph thing with it”. In a world laden with heuristics, I have code that says “it’s a paragraph, but let’s see if it’s _really_ a paragraph, then we’ll either do the paragraph thing or do some wildly different thing”. I guess it’s less that the code structure has to be adaptable than it’s that my mindset has to be adaptable.

Regardless of the flexibility of my mindset or my code, though, heuristics, by their nature, do not neatly partition the universe. They leave some things out, they overlap, and/or they tangle together in increasingly strange ways. I’ll never remember, when it comes time to add some new code, all the situations that got me to this point or all the ways that things can go wrong.

I’m not yet a full convert to test-driven development, but when dealing with beastly input, I consider a bestiary to be quite necessary. A big set of unit tests, with a perfect specimen of each of the beasts I’ve encountered, each named after the ticket in the ticket-tracking system that brought it to me. I had one project in the past where I should have created a bestiary but didn’t, and that project was one of the worst disasters in my professional life. Another one like that and I would have traded my keyboard in for a shovel and started a new career…


Design Balls

A friend once told me about one of his introductory computer science classes, wherein a fellow student would occasionally stop the professor to ask “Yeah, but, how do we know what computers can do?!?”. The professor didn’t really have a great answer for him, and after a while the student gave up asking and dropped the class.

It’s easy to think that this person just did not have it, whatever it takes to be in computer science. Naturally, he should leave because he had no business in the class in the first place. And in a sense, yeah, that’s precisely right, he didn’t have It, and It is quite necessary. But I wonder whether what It was is something that could be gifted by the professor or another student, or if he could have continued with the lectures and exercises for a while before It began to dawn on him, and things would start to crystallize.

I also wonder whether It was, to quote a phrase that popped up unbidden in my head just a while ago, “design balls”. Because I remember a time when I didn’t have my computer program design balls, when I was basically bluffing it, and now I really have It, and I’m not bluffing. I’ll have to do some serious introspection to determine when the magic transition happened, though I can say that luckily it happened well before I was taking formal courses in computer science, so I didn’t have the great difficulty outlined above.

You can build a textbook definition of the design process by stringing together various phrases:

  • define need
  • derive system outline
  • calculate component parameters
  • analyze expected system response
  • simulate or prototype system
  • verify conformance to the specification
  • ship design artifacts to manufacturer for implementation

blah blah blah, you know what I mean. Those phrases even make sense to someone who is already past a certain point in their design education. But there’s an unspoken, ummm, let’s say, glue, that holds the pieces of the process together. I don’t know if it’s unspoken because it’s impossible to speak of, as my lame analogy suggests, or whether it’s just that sort of silence that develops around certain conceptual vortexes in a field, for whatever reasons.

And for now, I’m going to give up on speaking about this. Not forever, because now that I’m pondering it, I feel like it’s a really important thing, and perhaps there is hope and help for those who want to have It but don’t yet have It, and maybe It is teachable. Hmmmm.



Hmmm, never ran across a debdiff before today. Gonna see if the one posted for this libvirt problem helps me.

Update (which was actually posted simultaneously with the original): yup. Yay, open source is just so sweet to me.


Things undone

Occasionally I feel bad about all the things left undone.

In software development, there are two things we routinely do that occasion such feelings. (Ooops, I’ve been told I’ve been using the word ‘feel’ too much.) We use ticket-tracking systems, and we write TODO comments.

Ticket-tracking systems are a way to keep track of work to be done and the evolution of the product. A ticket is a short, detailed description of an action to be taken to improve the product. They come in different types: bug, improvement, feature, task, etc. The description is action-oriented and specific, and assumes the current product as the context (at least the best tickets are like that; maybe I’ll talk a little more later about less-than-best tickets). Here’s a decent-looking example of a ticket. I could go on for quite a while about the good and the bad of ticket-tracking systems… For now, the salient points are:

  • a ticket represents some work to be done
  • tickets are assigned to someone, to do the work
  • as a developer, you spend time daily looking at the list of tickets assigned to you

TODO comments are a simpler manifestation of a similar idea. When writing code, you might see room for improvement in some particular technical aspect, but you don’t have the time to tackle it right now. So you put a little TODO comment in there explaining what you think might be improved. Here’s a little contrived example:
if n == 0:
print "none"
elif n > 0:
print "some"
# TODO: what if n is negative?

In this case, I can see that the set of all numbers is not covered by the two conditions, so I’m led to wonder whether I should be handling the case where n is negative. Now, if I really thought that case was going to arise naturally and soon, I’d figure out how to handle it and write the code. But in this example, I’m thinking: it could happen, but I can’t see any reason it should. Let’s say n is the count of apples in the box from the apple-counting machine. No reason for that to be negative. But, ya know, it could be, somehow. So I note that I feel uneasy about leaving that case uncovered. Later, when browsing through the code or debugging it, I might see that comment, and due to an increased understanding of the world or an abundance of spare time or something, I’ll decide to cover the case. Maybe I learned that the apple-counter counts a zucchini as -1 apples, and somehow a zucchini gets in the box occasionally.

Where tickets and TODOs can get depressing is when you look at your list and realize that there are 30 tickets on it, 10 of which are more than a year old, and 23 of which you’ve been systematically ignoring for a long time. Or you run across a TODO and a scenario flashes through your mind, where the improbable thing happens, and the resulting chain of causality ends with a lawsuit in Taiwan.

It might be easy to say “Well, this simply should never happen! These tasks should be dispatched with great haste and with all available resources!”. Certainly one can imagine a process or organization where that is the rule, and buildups of old cruft never happen. But follow me for now into my world: in the places where I’ve worked, there is no such rule, and the cruft does build up. I’ve left jobs with dozens of tickets and TODOs left in my wake, where they either evaporated when I left or are still lurking around somewhere today.

The rule in these places is more like “Good ideas should be captured immediately and dealt with when there’s time”. That’s a perfectly valid rule to use, but it does tend to lead to a situation where there are lists of lots of good ideas left unattended. If you believe in good ideas and you want to perfect your work, then these lists sometimes look pretty grim, numerically speaking. There are always more good ideas than time to implement them, by a laaarge factor.

So, you learn to live with the backlog.


A DSL menagerie

Someone should compile a menagerie of domain specific languages, to highlight all the different ways people have used DSLs to solve real-world problems…


Digging for answers

Frustration. While it feels like a waste of time and can put my stomach in knots, I think it can help me become a better person, in general, at least.

Lately I’ve been working on adapting this big open source server project for use for a client (sorry about the vagueness, but the stuff I’m working on is proprietary). In the best of all worlds, I’d deploy the client’s app on the server and it would just work. And it won’t surprise you to learn that we don’t live in that world. The world we live in has features like:

  • the server has an old version of the main framework that the client’s app uses, meaning I have to find the uses of the new features and back them up to older code
  • the server project has issued very little documentation. That might make me look for an alternative server, but it happens that there are only two alternatives and this one seems, from various viewpoints, to be the best by far
  • there is some problem

That last point might seem even more vague than the others, and believe me, I wish I could be more specific. But ya see, that’s approximately all the information I can get the server to give me about this particular problem. I try to run a tiny piece of code, and it tells me “Error: 500”. I’ve spent hours digging through the pieces of the server; it has load-balancers and caches and proxies and RPC servers talking to database adapters talking to databases, talking in HTTP and a few other protocols; it has components written in Python and Ruby, Java, and C; it has log files in many different places and with different ways to enable logging in each component; etc.

So far, all this digging has led me to the conclusion that yes, there still is a problem. That’s frustrating. But the process leads me to probe into the server components with various techniques, so I’m learning. Learned about ngrep, tonight, for example. Learned how to use the shells provided with the various databases to try to see what’s going on in them. Learned about the wonders of libvirt. Learned about some new network protocols.

At a higher level, I think these experiences teach me about patience, persistence and investigative techniques. Those abilities come in handy, and it seems I still somehow have less than the maximum amount of each of them.


Sense of accomplishment

I have a difficult relationship with the concept of ‘a sense of accomplishment’. I feel like it’s somewhat necessary to true motivation, but then I feel it’s a character flaw to truly feel such a sense. I feel like I’ve done a lot of good work, but there’s always more I could have done. I feel like the things I’ve done are significant, but if I pointed them out to the average person, they’d be far less than impressed.

One of my clients reached a big product milestone recently. It’s pretty amazing that I’ve been with this product since its inception until this milestone. I felt the need to reflect on what I’ve worked on in the product in that time.

  • code generation for AS3 to Python RPC
  • HTML and RTF paste
  • spell checking
  • highlights and callouts
  • XBRL HTML slicing
  • equation parsing and evaluation
  • slimming Flex module download size
  • HTML import
  • browser issues with keys and mousewheel
  • Google AppEngine/EC2 integration
  • parallelization of translation functions
  • Undisclosed Big-Deal Project
  • PDF export

When I look at each of those bullets, I remember lots of work that I had to do on each, the difficult problems that arose and the solutions to them, the necessity of each function and the contribution toward the overall product. But I also remember the things left undone that could make each function more perfect, the work that others did that I can’t take credit for, that if I pointed out these functions in the app, someone unfamiliar with the job wouldn’t understand the work that it represents, and that in at least one of these projects, all of my code is now dead.

So my sense of accomplishment is a complicated and fragile thing. That’s not a problem; actually, when I say it, I feel like that’s a more mature attitude than one that’s more monolithic. Maybe that’s a partial solution to the question I mentioned above of whether it’s a character flaw to feel a sense of accomplishment: maybe it’s only hubris to feel good about the foreground of one’s accomplishments if one doesn’t also understand the inseparable background against which they are viewed.