Morse+code+exercise

Morse code as a study of coding and optimization
See @http://www.learnmorsecode.com/ A tree diagram with 0 to the left of 1:

Assuming we don't mess with the length of spaces between letters and between words (which would be a bad idea if human operators are ever involved), Morse code is isomorphic to assigning binary numbers to the symbols of the alphabet, and numerals. Other characters are addressed in International Morse Code, but let's keep it simple and ignore those. A dot = 0 and a dash = 1 is the usual mapping. 'E' is the binary number 0, 'F' is 0010, etc. Morse assigned the codes presumably in such a way as to reduce the time to send typical messages. A nice little exercise is to measure the time efficiency of his code against other possible codes, and for different documents. For purposes of the experiment, let us ignore everything but letters and numerals (and whitespace). We understand that some documents have many numerals, and that those have unseemly long codes.

One simple alternative is putting the letters into "Lexical" order (call it Lex code).

How much worse is it? Sounds like a good job for a computer. Write a program which reads from stdin, ignores anything which is not a letter, digit or space, and tallies up the time for Morse code, Lex code, and at least a third code of your own design. Show totals (in units of how long a dot is) and relative efficiency for various codes. Remember that spaces take time. Can you beat Morse code? Can you design a code which is as slow as possible? Of course you will want to test your program against some short strings that you can figure out by hand. Eventually, run your program against at LEAST two long text sources. One should be Lincoln's Gettysburg Address. Choose something else that's at least that big. The US Constitution? Tolstoy's War & Peace? The IRS code? Your output/report should identify the source(s).
 * A || B || C || D || E || F || G || H || I || J || K || L || M || N || O || P || Q || R || S || T || U || V || W || X || Y || Z ||
 * . || _ || .. || ._ || _. || _ _ || ... || .._ || ._. || ._ _ || _.. || _._ || _ _. || _ _ _ || .... || ..._ || .._. || .._ _ || ._.. || ._._ || ._ _. || ._ _ _ || _... || _.._ || _._. || _._ _ ||
 * 0 || 1 || 00 || 01 || 10 || 11 || 000 || 001 || 010 || 011 || 100 || 101 || 110 || 111 || 0000 || 0001 || 0010 || 0011 || 0100 || 0101 || 0110 || 0111 || 1000 || 1001 || 1010 || 1011 ||

Make your program DATA driven, not a huge nested IF statement nor a SWITCH statement. Does the fact that you intend to play with altered codes affect your decision of how to represent each code? Is there such a thing as TOO simple?

Sample Code Comparison Output

FWIW: the Squeak code used to create the above sample output: It is a quick hack, not very pretty. -Rik

Comments on Morse Exercise