Hello and good morning, and welcome to another Thursday morning blog post.
I’m not sure how many of these I’ve written, but since I’ve done them nearly every Thursday, even when I was writing fiction on all other weekdays (and excusing the occasional sick day), we can guess that I wrote on the order of fifty such posts a year for about ten years. Thus, there are on the order of five hundred such daily posts over the years, each one nearly a thousand words long (and some going beyond that). So, overall, the number of words I’ve written in these Thursday blog posts alone is comparable to the number of words in my longest novel (Unanimity…so long I had to publish it as two separate books).
Of course, when we approach it from the point of view of actual information, à la Claude Shannon’s information theory and whatnot, I would have a hard time estimating how much actual information there is in such a post. In the first draft of the preceding paragraph and a half, there were 174 words, which comprise 940-ish characters (counting spaces, which I think one should count, since a space or the lack thereof can matter quite a bit in English).
Now, each character in a typewritten document, not counting ”special” characters, can have one of 26 letters (not counting upper and lower case as separate things for my current purposes) ten numerals, and maybe a comparable number of punctuation marks. So, each potential space in the writing would have a total of roughly 26 plus 10 plus, say, 8 other characters, so 44 possible characters. Rounding up, that’s about six bits per character (26 = 64). Rounding down would give five bits (which is only 32 possibilities), so it’s something closer to 6 bits than 5.
Assuming the ratio of characters to words in the average blog posts stays fairly consistent, that would be, for a 900 word post: (900/174) x 940, which rounding here and there* gives about 810,000 divided by, say, 180. This can be reduced first to 81,000 divided by 18, or 9,000 divided by 2, or 4,500 characters per post. Checking the math on the calculator gets roughly the same amount.
So, 4,500 characters, times five and some fraction bits per character, gives us between 22,500 and 27,000 bits of information per blog post. Let’s say 25,000 bits.
But when I look at the storage space of my average blog post, they are almost all between 17 and 20 K (which is actually as much as 160,000 bits) in size.
This mismatch shouldn’t be surprising, because while English is (like most written languages) a “redundant code”, storing a word processor document entails storing more than just the individual characters.
Returning to what we mean when we refer to the redundancy of written English, we mean that not every new character gives you as much information as is potentially available. For instance, if one types the letter “q”, what follows will almost always** be a letter “u” in English, and so we would be quite justified, at least in this, in writing the word “quite” as “qite”. But, of course, redundancy in any kind of code is useful for counteracting the problem of lost data in transmission, which was one of the things Claude Shannon was thinking about in founding information theory.
There are surely other ways in which the data in a given blog post is “compressed” during the process of saving, but I don’t know enough about the computer science of word processors to know the specifics of how that’s done off the top of my head. And since, of course, I write these blog posts “off the top of my head” each morning, I’m not going to try to research that subject for now. That would make writing my daily blog much less pleasant, and make the process quite (ha) a bit (ha ha) longer than it would otherwise be.
Now that I’ve thought about it and mentioned it, I’ll probably be on moderate alert for information regarding the process if I should happen to come across it, and if I do, I’ll be more likely to focus on it and add it to my model of reality than I would have otherwise.
And now I am rapidly approaching the 800 word mark for this post, a mark which I will no doubt pass before I have finished writing the first draft of this sentence. And, indeed, I did. So let’s draw this very peculiar post to its close, today.
I’m sure many of you*** are thinking something along the lines of, “Geez, I hope he goes back to just writing about depression and chronic pain and all that shit tomorrow…this post has been really boring.” To those people, I can only apologize. To anyone who shares my idiosyncratic interest in esoteric (but highly amateur in my case) things like information theory and whatnot, well—I hope at least you have enjoyed this.
TTFN
*It’s okay to do this since I’m not trying to be terribly precise, just to get “back of the envelope” numbers for fun, anyway.
**Not in this case, of course, since there is a quotation mark after that last “q”…and this one here, as well. So, the “u” is not a completely redundant character, but it certainly doesn’t give anything like 5 more bits of information.
***If a fraction of my few dozen readers can really be called “many”; I’ll let myself get away with using it as at least a relative term.
