Song of The Day: Bugs - Artist: Pearl Jam

I came across an older Wired article that discusses history’s worst bugs. As I read about the Mariner I space probe and other catastrophic bugs, I asked myself: “What is the worst bug you encountered or introduced during your programming career?” One came to mind immediately for me: the “data VARBINARY truncate bug.” Although it may not compare, fortunately, to some others, it was quite taxing at the time :).

So this was some years ago and the details are somewhat hidden to protect the innocent :), but my team and I were relatively new to Sybase and the VARBINARY field. We thought, “Well, since we may have variable length binary data, let’s be efficient and use the VARBINARY type.”

So everything cruises through development and QA with relative ease; we are heroes; but as we deployed the application–in a limited fashion–we received reports of corrupt data. Uh-oh!

We scoured the code and database to find the culprit. Upon investigation we discovered a corruption rate of about 1% for a particular data field. However, the root cause continued to elude us. As each hour passed, the pressure intensified. What could be the problem?

I really cannot explain it, but it was like the apple falling from a tree for Newton. Not that I’m Newton, but as I thought about the problem, I wondered if the stored binary data in the database could possibly contain zero-valued or null bytes and if the Sybase database could possibly be truncating the VARBINARY field data. I read through the Sybase docs and found a potential answer to part of it: VARBINARY truncates trailing null bytes or zeros.

I discussed the hypothesis with my colleagues and then another engineer verified that the hypothesis was in fact true. The verification, and patch, process entailed padding existing data in the database with zero-valued bytes as the application retrieved the data, which “fixed” the corrupt data within the VARBINARY field. Later, we switched to a fixed-length BINARY type for a permanent fix.

We lived happily ever after, fortunately. Although the root cause is patently obvious today, it caused a little grief way back when. Since then, I’ve always been as careful as possible to understand underlying data types and data interaction between them.

So what about others? Willing to share a hair-raising bug or experience? :)

Note: In a future entry I may review Sybase’s implementation of trailing zeros compared to other RDBMS.

Tags: ,