Saturday, 13 April 2013

Gibberish

Gibberish

What is Gibberish?

The gibberish (nonsense text) presented here is generated by a very simple computer program. Given some sample text, say Shakespeare, as input, the computer generates output which is random, but which has the same statistical distribution of characters or combinations of characters. (A character may be a letter, a digit, a space, a punctuation mark, etc.)
In level 1 gibberish, the output has the same distribution of single characters as the input. For example, the probability of seeing a character like "e" or "z" or "." will be approximately the same in the output as in the input. In level 2 gibberish, the output has the same distribution of character pairs as the input. For example, the probability of seeing a pair like "th" or "te" or "t." will be approximately the same in the output as in the input. In general, in level n gibberish, the output has the same distribution of groups of n characters (n-tuples) as the input. (The algorithm is a letter-based Markov text generator. Level n gibberish is a Markov chain of order n-1.)
It is amazing how well this simple algorithm works, even for very low level numbers. For example, at level 2, you can easily recognize different languages. At level 3 you can recognize the styles of different authors.
For even more fun, the gibberish generator can easily blend two different languages or two different authors. If the input is simply the text from author A followed by the text from author B, the output will be a smooth blend of the two.
To generate your own gibberish, go to Gibberish Generator.
To see some samples, go to Gibberish Samples.
A final thought: Is the human brain simply a level 100 gibberish generator?
References: Program named Mark V. Shaney (pun on Markov Chain) by Bruce Ellis, Rob Pike, and Don P. Mitchell, publicized in the June, 1989, Scientific American "Computer Recreations" column titled "A potpourri of programmed prose and prosody" by A. K. Dewdney.

No comments:

Post a Comment