Occasionally I get on a little anagram kick. Last time, I wrote a little program that let you input a starting text, then interactively create your anagram text. As you typed the anagram text, it would complain if you used a letter that wasn’t available, or suggest words from the dictionary that were available to you. I made the search pretty fast by putting each word in the dictionary into a sort of radix tree, where both internal nodes and leaves had lists of words that could be formed with the letters used thus far on that path.
This time around, I wanted a bit more of an automated approach. I haven’t seen any great results yet, but here are some that are at least sorta evocative:
“The Miss Rhode Island pageant” < => “The time and rags and polishes”
“federal constitution” < => “failure to discontent”
“that girl with sunbonnet eyes” < => “but only the greatness within”
“college teaching is almost a” < => “glance at the seismological”
“a cut over his left eyebrow” < => “over by the sluice of water”
The approach I took this time was to start with an English corpus (well, a couple, appended, totaling 3M words). For each ‘phrase’ (sequence of consecutive words, really) of between 3 and 6 words, I stick the phrase in a list in a hashmap, where the key is the sorted list of letters used in the phrase. After eating through the whole corpus (which bloats the Python hashmap to about 4GB of RAM), I look for any lists of length longer than one, sort and filter the results to be a little more useful, and spew out the results. Even though the input comes from a nominally grammatical source, there’s still plenty of dumb junk, so it still requires a human to pick out the few good answers, and maybe do some word reordering. And even after that, you end up with what you see above, so, ya know, why bother? Just cuz.