information-theory

Hello and good morning, and welcome to another Thursday morning blog post.

I’m not sure how many of these I’ve written, but since I’ve done them nearly every Thursday, even when I was writing fiction on all other weekdays (and excusing the occasional sick day), we can guess that I wrote on the order of fifty such posts a year for about ten years. Thus, there are on the order of five hundred such daily posts over the years, each one nearly a thousand words long (and some going beyond that). So, overall, the number of words I’ve written in these Thursday blog posts alone is comparable to the number of words in my longest novel (Unanimity…so long I had to publish it as two separate books).

Of course, when we approach it from the point of view of actual information, à la Claude Shannon’s information theory and whatnot, I would have a hard time estimating how much actual information there is in such a post. In the first draft of the preceding paragraph and a half, there were 174 words, which comprise 940-ish characters (counting spaces, which I think one should count, since a space or the lack thereof can matter quite a bit in English).

Now, each character in a typewritten document, not counting ”special” characters, can have one of 26 letters (not counting upper and lower case as separate things for my current purposes) ten numerals, and maybe a comparable number of punctuation marks. So, each potential space in the writing would have a total of roughly 26 plus 10 plus, say, 8 other characters, so 44 possible characters. Rounding up, that’s about six bits per character (2⁶ = 64). Rounding down would give five bits (which is only 32 possibilities), so it’s something closer to 6 bits than 5.

Assuming the ratio of characters to words in the average blog posts stays fairly consistent, that would be, for a 900 word post: (900/174) x 940, which rounding here and there* gives about 810,000 divided by, say, 180. This can be reduced first to 81,000 divided by 18, or 9,000 divided by 2, or 4,500 characters per post. Checking the math on the calculator gets roughly the same amount.

So, 4,500 characters, times five and some fraction bits per character, gives us between 22,500 and 27,000 bits of information per blog post. Let’s say 25,000 bits.

But when I look at the storage space of my average blog post, they are almost all between 17 and 20 K (which is actually as much as 160,000 bits) in size.

This mismatch shouldn’t be surprising, because while English is (like most written languages) a “redundant code”, storing a word processor document entails storing more than just the individual characters.

Returning to what we mean when we refer to the redundancy of written English, we mean that not every new character gives you as much information as is potentially available. For instance, if one types the letter “q”, what follows will almost always** be a letter “u” in English, and so we would be quite justified, at least in this, in writing the word “quite” as “qite”. But, of course, redundancy in any kind of code is useful for counteracting the problem of lost data in transmission, which was one of the things Claude Shannon was thinking about in founding information theory.

There are surely other ways in which the data in a given blog post is “compressed” during the process of saving, but I don’t know enough about the computer science of word processors to know the specifics of how that’s done off the top of my head. And since, of course, I write these blog posts “off the top of my head” each morning, I’m not going to try to research that subject for now. That would make writing my daily blog much less pleasant, and make the process quite (ha) a bit (ha ha) longer than it would otherwise be.

Now that I’ve thought about it and mentioned it, I’ll probably be on moderate alert for information regarding the process if I should happen to come across it, and if I do, I’ll be more likely to focus on it and add it to my model of reality than I would have otherwise.

And now I am rapidly approaching the 800 word mark for this post, a mark which I will no doubt pass before I have finished writing the first draft of this sentence. And, indeed, I did. So let’s draw this very peculiar post to its close, today.

I’m sure many of you*** are thinking something along the lines of, “Geez, I hope he goes back to just writing about depression and chronic pain and all that shit tomorrow…this post has been really boring.” To those people, I can only apologize. To anyone who shares my idiosyncratic interest in esoteric (but highly amateur in my case) things like information theory and whatnot, well—I hope at least you have enjoyed this.

TTFN

*It’s okay to do this since I’m not trying to be terribly precise, just to get “back of the envelope” numbers for fun, anyway.

**Not in this case, of course, since there is a quotation mark after that last “q”…and this one here, as well. So, the “u” is not a completely redundant character, but it certainly doesn’t give anything like 5 more bits of information.

***If a fraction of my few dozen readers can really be called “many”; I’ll let myself get away with using it as at least a relative term.

Hello. Good morning.

Aaahhh, doesn’t that feel better? Now I can use my standard Thursday blog post opening phrases, because today is, in fact, Thursday. It’s the 21st of November, the third Thursday of the month, so in the USA you only have seven shopping days until Thanksgiving.

Speaking of Thanksgiving, since next Thursday is that holiday, I probably will not be writing a blog post then. It is one holiday on which our office is always closed. We will be open on so-called Black Friday, but I can’t guarantee that I’ll write a post on that day.

Of course, in principle, I cannot guarantee that I’ll write anything at all ever again after this post. I may not even survive to post this entry*‒I am in the back seat of a Lyft, on the highway (I-95) of the East Coast of the US, so goodness knows there’s a non-zero chance of a fatal accident. I would even wish for one, but I know such a thing would involve harm and possibly death to other, more innocent, people.

Also, of course, wishes don’t actually directly affect reality‒thank goodness. Imagine if even one percent of wishes came true as wished. The world would be thoroughgoing chaos…and not in a good way. I tend to say of wishes that “If wishes were horses, then we’d all be hip deep in horse shit,” but it would be even more terrifying if wishes worked.

The “if‒then” character of the wishes saying (my version or the more SFW one that involves beggars riding) often makes me think of lines of computer code in some generic programming language, like:

If wishes==horses then execute beggars.ride

Or maybe

If wishes==horses then horseshit_level = “hip deep”

I wonder what that would look like in machine language. Or, I wonder what it would look like in straight binary. Really, though, I know part of the answer to the latter piece of wondering: it would look, to the naked eye, like a random string of ones and zeros, perhaps the tally of some very long record of flipping a coin and marking heads as 1 and tails as 0 (or vice versa).

Actually, of course, given a binary-based computer language, one can literally generate every possible computer program just by flipping an ever-increasing number of coins. Or, to be honest, one can do it just by counting in binary: 0, 1, 10, 11, 100, 101, 110, 111, 1000…

This is why, if memory serves, computer science people and information theory people say that every program can literally be assigned (and described by) a number. You could express that number in base ten if you wanted, to make it a bit more compact and familiar to the typical human. Or, if you want to be more efficient and make conversion easier, you can use hexadecimal. This is easier because a base-sixteen number system is more directly and easily converted to and from binary, since 16 is a power of 2 (2 to the 4th).

Even the human genome, or any genome in fact, could be fairly readily expressed in binary. The DNA code is a 4 character language, so it wouldn’t take too much work to make it binary, however you wanted to code it. Then, each person’s genome would have a single, unique number. That’s kind of interesting.

It would be a bit unwieldy as an ID number, of course. The human genome is roughly 3 billion nucleotides long, which means it would be roughly 6 billion binary digits (AKA bits). And since every ten bits is roughly a thousand in base 10 (2^10 is 1024, which is very close to 10^3, aka 1000) then 6 billion bits should be roughly 2 billion decimal digits long (a bit less), which is much, much larger than the famously large number, a googol**.

It’s a big number. This should give you at least some idea of just how unique each individual life form is at a fundamental level. There are so many possible genomes that the expected time until the final heat death of the universe is unlikely to be long enough to have a randomly created duplication within the accessible cosmos.

Of course, within an infinite space‒which is the most probable truth about our universe as far as we can tell‒one will not only have every possible version that can exist, but will have infinite copies of every possible version. Infinity makes things weird; I love it.

Of course, just as with the making of computer programs by simply counting in binary, the vast majority of genomes would not code for any lifeform in any kind of cellular environment, using any given kind of transcription code you might want (the one on Earth, found in essentially all creatures, uses three base pairs to code for a given amino acid in a protein, but that’s not all that DNA does). Similarly, most of the counted up programs would not run on any given computer language platform, because they would not code for any coherent and consistent set of instructions.

But even so, you would still, eventually, get every possible working program, or every possible life form in any given biological system if you could just keep counting.

On related matters, there are things like the halting problem and so on, but we won’t get into that today, interesting though it may be (and is).

It’s quite fascinating, when one is dealing with information theory (and computer science) how quickly one encounters numbers so vast that they dwarf everything within the actual universe.

Mind you, the maximum possible information‒related to the entropy‒carried within any bounded 3-D region is constrained by the surface area (in square Planck lengths) of a black hole with that size event horizon. For our universe, roughly 96 billion light years across, I think that’s something like 10 to the 124th bits, or at least it’s that many Planck areas. That’s quite a bit*** smaller than the number of possible genomes, though I have a sinking feeling that I’m underestimating the number.

And information, at least when instantiated, has “mass” in a sense, and the upper limit of the amount of information in a region of spacetime is delineated by the Bekenstein entropy description. So there’s only so many binary strings you can generate before you turn everything into a black hole.

Something like all that, anyway.

I may have been imprecise in some of what I said, but when you’re dealing with very large numbers, precision is only theoretically interesting. For instance, we**** have found Pi to far more than the number of digits needed to calculate the circumference of the visible universe down to the Planck length. It would require only about 40 digits of Pi to get to that precision to the size of a hydrogen atom, and those are only about 10^25 Planck lengths across, so we wouldn’t expect to need much more than 65 digits of Pi to get that precise, but let’s be generous and use 100 digits.

How many digits of Pi have actually been “discovered” by mathematicians? Over 105 trillion digits. Talk about angels dancing in the heads of pins! It’s literally physically impossible, according to the laws of quantum mechanics, even to test whether that number precisely defines the ratio of any given circle to its diameter by measuring it. One cannot, in principle, measure finely enough.

Still it just goes to show that mathematics is vastly larger in scope than any instantiated, superficial reality. Information is deeper than one might think…so to speak. But, then, so are minds themselves, vastly deeper.

As Idris/the TARDIS asked in Doctor Who, Series 6, episode 4, “Are all people like this? So much bigger on the inside?” Yes, Idris, I suspect they are, even those people we don’t like and feel the urge to denigrate.

That’s enough for today, I think. I’ve achieved nothing, really, other than write a Thursday blog post, but then again, that’s all I meant to do. I hope you have among the better half of all the vast number of possible days available to you.

TTFN

*If you’re reading this, though, I clearly did survive. I have mixed feelings about that.

**How much larger? Soooo much larger that if you subtracted a googol of something from 10^1,800,000,000 of something, you would not change it to any extent measurable even by the most precise instruments humans have ever created. And a googol is already something like 10 to the 19th times as large as the total estimated number of protons and neutrons in the accessible universe.

***No pun intended.

****Actually, I had nothing to do with it; it’s just the sort of “royal we”***** kind of thing everyone uses when discussing the accomplishments of humanity as a whole.

*****Not to be confused with royal wee. That’s the sort of weird, niche thing one might find for sale in mason jars on the dark web. Be careful if you’re into such things. I wouldn’t buy it unless you’re sure of the source, so to speak.

Robert Elessar

If I could write the beauty of your eyes and in fresh numbers number all your blogs

If I could write the beauty of your blogs, and in fresh numbers number all your graces…

If wishes==horses then execute beggars.ride

If wishes==horses then horseshit_level = “hip deep”

Please share this:

If wishes==horses then execute beggars.ride

If wishes==horses then horseshit_level = “hip deep”

Please share this: