The blog version of Give Blood Magazine, est. 1972

Is it me, or is it my vision?

My photo
My first memory is of losing my glasses. Had they not been found, folded carefully on the top edge of the sea wall, where would we be today?

Saturday, April 12, 2008

My beautiful bug

This is a great place to announce that after a while now I've again become employed. Yes, and the lucky company is IMMI, Integrated Media Measurement, Inc. I'm really thrilled to be with them.

My beautiful bug actually showed up during my back-and-forth with IMMI, in an e-mail in which I trumpeted "I'm a root-toot-tooting son-of-a-gun..." or words to that effect, the point being that the first two words of the message contained an apostrophe and the single letter word "a".

In my Gmail Sent messages folder, after dispatching this testimonial, I suddenly noticed this little glitch, and it made me go back and proofread my message. I hate looking stupid to companies I'm promoting myself to. The summary line read: "I'ma root-toot-toot..."




But, huh, it really didn't look like I'd committed this typo. The message itself showed the correct text, with the space snugly positioned between the two words. Hmm.

When in doubt, blame it on AJAX. Between being buffered and buffeted by keyloggers and other active text processing, I've gotten a little jaded, somebody just spirited away that space for some reason and forgot to restore it. It's OK, I know what you mean.

But which program or script, and by what mechanism? It's an interesting exercise. I began by noting that the strings shown in the Gmail message views are "snippets", generated by the application. Gmail help says they're "(like Google web search!)" Actually what they remind me more of are Ask.com's crappily concatenated news abstracts of yore.

I spent a little time constructing some test cases to understand what is really going on. As is often the case, the results were more complicated than you might think. I began with a few more examples using meaningful text, focusing on the use of the apostrophe to indicate a contracted word:

Below is a screenshot of my inbox. The first part of the text is taken from the Subject of the message, where I've pasted a copy of the text. The second part is the snippet generated by Gmail from the message text:



Both of my follow-up tests show normal behavior. But there's something going on here, I just know it. It's interesting that the trailing punctuation I provided has been stripped.

Fairly quickly I began to abstract my test cases. There aren't that many contractions or single letter words. It turned out to be possible to stimulate a lot of incorrect text processing:


Any alpha character after the apostrophe will cause any number of trailing single character words (single chars surrounded by whitespace) to be concatenated


The presence of any word token containing more than one character stops further concatenation on the line.


The concatenation behavior seems to be global. It can occur multiple times in the same line.


Appears to confirm case-insensitivity of behavior. More on this below:


Hmm, but it appears that a change in case between single letter words inhibits the concatenation.


Very strangely, when the apostrophe is followed by an uppercase char and the next following single letter word is lowercase, no concatenation is performed.


This test appears to associate the case of the character after the apostrophe with the case of the single-letter word that follows. When they are the same, concatenation occurs, when different, it doesn't. It begins to suggest that part of the algorithm is to deal with inputs in which CapsLock is on.

I won't go on much further. However another series of tests showed surprisingly that the behavior exhibited has nothing special to do with the apostrophe. I might have imagined that things were happening because the single quote character needed to be escaped in some situation. But it turns out that almost any non-alphanumeric can be used to show this funky behavior:


In this case two cases are shown, one where a trailing character at the end of a word, ")" causes concatenation of the following single-letter words but does not delete the space after it, and one in which a "#" followed by a single character wipes out the space and performs the concatenation.

So it makes me wonder. Could we construct a regular expression that matches the observed behavior of Gmail processing as evoked through these test cases? And would it prove anything? Actually, as I mentioned above, it's possible that what I'm seeing is the result of several operations, not a single one. Great stuff, though.

5 comments:

Anonymous said...

you haven't lost your touch! congrats on the new gig!

Anonymous said...

Definitely congratulations! And I'm fascinated that in your world, one would answer a job ad with "I'm a root-toot etc." And it worked!

People always looooove finding Google slipups, too.

Bad Tricoteuse said...

Ooooh, that is a good bug!

Anonymous said...

They're not working you hard enough!

Unknown said...

Congrats on the new job --- and keep them all on their toes STeve.

Just got over a terrible cold/flu - and the cure? Read The Road, by Cormac McCarthy.

Blog Archive