Category Archives: General

Catch-all category

General

ASZip bug, fix

In case you use the nice little library ASZip to create zip files from Flex, be aware that there’s a bug in the released code v0.2, but that there’s a patch. The bug is described and a patch provided in Issue 1. The same bug causes Excel to not like the files, Issue 2.

Just thought I’d blog about this since I rediscovered and refixed the bug before realizing the issue tracker had a patch. I’m not sure why the patch isn’t checked in; apparently the author forgot about it or was otherwise prevented from working on the library more.

General

regex coverage

I’m slowly incorporating more automated testing into my development workflow these days. But there are some holes in the available development tools that make it hard to make that a reality through all my code.

For example, I have some slightly tricky regular expressions, embedded in XPath expressions. These are in code that tries to heuristically recognize certain constructs within a class of human-generated documents and do special processing on them. Developing a heuristic presents special problems, for me, at least, in that the specification is vague and can only be refined by iterative feedback based on data from the field. I get wary about these things… so the more testing I can do, the better. I also would like to ensure that my tests are covering the code I’m writing. In this case, the regexes and XPath expressions are code just as much as the Java and XSLT are.

I asked on Stack Overflow whether there are tools out there for measuring code coverage for regexes. That’s only a subset of my overall problem, but I thought it was interesting enough to see what might be out there. So far, nobody has pointed me to a tool that specifically does that.

I played around for an hour with an automaton library and a graph library to try to visualize coverage of a DFA implementation of a regex against a set of test strings.

DFA coverage graph

Clearly not great, but the approach might yield something if someone (probably not me) were to refine it. I had some fun with it and felt like I was getting something from the process.

Another approach that I thought of while working with this was to generate exemplars from the regex (since that’s pretty easy), verify by hand that they fit the specification, then use those as a starting point for test cases. That might seem a little backward, but it does a couple things: during the verification step, you’re learning whether your regex really recognizes what you expect it to, and once you’ve got your list of verified test strings, then later changes to the regex might cause useful test failures. If you generate ‘enough’ exemplars, then you know you have test coverage.

How you’d define ‘enough’ is a non-trivial question. In my experiment above, I was looking at which states and transitions of the DFA were covered. Those are at least two levels of coverage you could look for, but it seems that some sort of path coverage would be nice, too. There are a couple projects on the web that generate strings from regexes: Xeger and Rex. They both use a randomized approach, so they aren’t going to guarantee any specific sort of coverage. And indeed, it may be that for a given regex that’s so complicated it needs test-coverage-measurement, the set of inputs you’d need to truly cover it would be crazy-big.

Which leads me to another thought: if you are using XPath or regex that complicated, maybe it’s not worth it. I mean, it’s all cool and clever to make one (220-character long) expression that does something complicated, but if you can’t really feel confident about how well it works by inspection, then it’s probably too dense to be readable and maintainable. On the other hand, the XSLT compiler can probably do a more efficient job of finding the nodes in question than some hand-written Java or something, and if you want to avoid mixing other languages in with your XSLT process… It’s a puzzle.

General

Scanners’n’displays

Just in case you were wondering: laser barcode scanners can’t read off an LCD display, but they can read from an e-ink display (Kindle 3, at least).

(That’s what I predicted.)

General

Concision

I’m not very good at being concise. I wonder how people do it.

General

gnome-instability-daemon

My commitment to bleeding-edgery leads to one weird consequence. When I upgrade Ubuntu to an alpha version (I’m running 10.10 now), there’s about an hour of readjusting configurations and dealing with obsoleted packages, then things get back to pretty stable. Except that until nearly the final release, gnome-settings-daemon crashes on me once or twice a day. I don’t know what’s going on in there that could be so unstable during the development phase of a distribution… it sounds like it’s basically a little client-server database for key-value trees. Thought that problem was already solved.

Anyway, it’s easy enough to <Alt -F2> gnome-set <ret>, and I’ve never lost any work or had an episode of Futurama ruined or anything…

General

1D eye

Barcode scanners get a great algorithmic advantage by having a 1D scanning laser eye. Kinda cool.

General

Serious omission?

I’m reading The Success of Open Source, which seems to be a pretty interesting analysis of the sociopolitics of open source development models. But I just noticed what seems to be a big omission: I can’t find “BDFL”, “van Rossum”, or “Python” in the index. I have a feeling that he’ll still cover the BDFL topic in there in some guise, but even if so, how can I excuse the absence of Python in a book about the SUCCESS of OPEN SOURCE? Geesh :-/.

General

C++ error messages

OK, one thing I don’t miss about C++ is the error messages. Here’s an incredibly obtuse way to say something:

In file included from /usr/include/c++/4.4/bits/stl_algobase.h:67,
                 from /usr/include/c++/4.4/bits/char_traits.h:41,
                 from /usr/include/c++/4.4/ios:41,
                 from /usr/include/c++/4.4/ostream:40,
                 from /usr/include/c++/4.4/iostream:40,
                 from ../src/Flipper.cpp:1:
/usr/include/c++/4.4/bits/stl_iterator_base_types.h: In instantiation of ‘std::iterator_traits > > >’:
/usr/include/CGAL/centroid.h:833:   instantiated from ‘CGAL::CGALi::Dispatch_centroid > >, CGAL::Point_2 > > >’
../src/Flipper.cpp:102:   instantiated from here
/usr/include/c++/4.4/bits/stl_iterator_base_types.h:127: error: no type named ‘iterator_category’ in ‘class CGAL::Point_2 > >’
/usr/include/c++/4.4/bits/stl_iterator_base_types.h:128: error: no type named ‘value_type’ in ‘class CGAL::Point_2 > >’
/usr/include/c++/4.4/bits/stl_iterator_base_types.h:129: error: no type named ‘difference_type’ in ‘class CGAL::Point_2 > >’
/usr/include/c++/4.4/bits/stl_iterator_base_types.h:130: error: no type named ‘pointer’ in ‘class CGAL::Point_2 > >’
/usr/include/c++/4.4/bits/stl_iterator_base_types.h:131: error: no type named ‘reference’ in ‘class CGAL::Point_2 > >’

Translation: you don’t need to include ‘centroid.h’.

General

Offsetting a polyline

Thanks to a question on StackOverflow, I did some playing with Java’s Graphics2D. Below is some code to generate an offset of a polyline, i.e. given a polyline (a string of connected line segments), generate a polygon that represents a thick version of that line.

Clearly, the Java guys took some shortcuts here, because the polygon ends up including a lot of interior segments that you don’t really want, etc. However, filling it using a nonzero winding rule covers a lot of sins. And I can understand why they committed those sins, since it’s kinda hard to do in general. Read the CGAL page about offsetting for an intro. Sometime I’ll have to investigate the shortcut methods in the OpenJDK source; looking at this output, I’m definitely curious.

Unfilled
Unfilled
Filled
Filled

import java.awt.BasicStroke;
import java.awt.Shape;
import java.awt.geom.Path2D;
import java.awt.geom.PathIterator;

public class StrokePath
{
    public static void main(String[] args)
    {
        // set line width to 6, use bevel for line joins
        BasicStroke bs = new BasicStroke(6.0f, BasicStroke.CAP_SQUARE, BasicStroke.JOIN_BEVEL);

        // create a path for the input
        Path2D p = new Path2D.Float();
        p.moveTo(50.0, 50.0);
        p.lineTo(65.0, 100.0);
        p.lineTo(70.0, 60.0);
        p.lineTo(120.0, 65.0);
        p.lineTo(40.0, 200.0);

        // create outline of wide lines by stroking the path with the stroke
        Shape s = bs.createStrokedShape(p);
        // output each of the segments of the stroked path for the output polygon
        PathIterator pi = s.getPathIterator(null);
        while (!pi.isDone())
        {
            pi.next();
            double[] coords = new double[6];
            int type = pi.currentSegment(coords);
            switch (type)
            {
            case PathIterator.SEG_LINETO:
                System.out.println(String.format("SEG_LINETO %f,%f", coords[0], coords[1]));
                break;
            case PathIterator.SEG_CLOSE:
                System.out.println("SEG_CLOSE");
                break;
            case PathIterator.SEG_MOVETO:
                System.out.println(String.format("SEG_MOVETO %f,%f", coords[0], coords[1]));
                break;
            default:
                System.out.println("*** More complicated than LINETO... Maybe should use FlatteningPathIterator? ***");
                break;
            }
        }
    }
}
General

Elicitation

Been playing a little bit on Stack Overflow. Got interested after a few of my web searches ended up there and there were actual answers (if you’ve searched the web for programming questions before, you know the value of the average result on a forum site tends to zero).

Among the other potential positive values of answering questions there, I noticed one today: the process tends to elicit memories of knowledge or talents that I had forgotten that I have. Should be interesting.