Notes+9.30

Students: be sure to fill in details that you think are worth remembering.

No duplicate birthdays? How likely is that? Rik did a little Smalltalk in a workspace: Duplicate Birthdays

Today we peeked ahead to sorting algorithms by doing some kinesthetic activity outdoors.

Insertion Sort
Each student took a bottle cap which had an inside number (and an outside number for later use). Starting with one person beginning a line, students were added one at a time. I said: ONE AT A TIME. A new person walked along the line comparing their number until they found someone whose number is >= their own: then they squeezed in front of that person, forcing everyone further down the line to take one step further away.

This algorithm is O (n^2) = on the Order of n^2. When person #M arrives, they have to compare to some people, and everyone else has to move one space. So something happens with each of the (M-1) people already there.

&Sigma;i (from 1 to n) = 1+2+3+ ... + n = n(n+1)/2 There's the n^2 term.

Note that intelligent people really want to speed up the process and have several people going along the line simultaneously. That's parallel processing, which is a great way to speed things up if you can be sure to synchronize processes. (Where could it go wrong?)

Bubble Sort
Having put people in sorted order, we could tolerate doing more than one time through of bubble sort which would have put people in exactly the opposite order (worst case data for bubble sort). This algorithm is O (n^2)

Hash Table
The numbers on the outside of the caps simulate a hash index. With a hash table of size 20, some of the slots had a bunch of people in them: today #13 was loaded with about 10 kids. When two people map to the same slot, it is called a "conflict". Finding a given person means going to the (computed?) hash index and possibly traversing the small collection of people found there to find the right one.

Using a bigger table usually spreads things out. Re-hashing everyone into a table of size 40 reduced our worst-case bin. Unless we have prior knowledge that no two people would have the same key, no size of table will guarantee zero conflicts. So we HAVE to have a strategy for dealing with conflicts.

One way is to allow each slot to contain a collection of items with that index.

Another way we discussed is to stick to a flat table, but use different slots when there is a conflict. A naive (and badly performing) strategy is to just look further up the line until an empty slot is found. The trouble with that is that dense blocks form. Any new item which "belongs" anywhere in the stretch of filled slots will fill the next available slot, making the stretch even longer.

An improvement is to jump some distance greater than 1. Suppose we just took multiples of the original hash index? Think what would happen with our bunch of #13s. The first one would get slot 13. The next one would find 13 full, so he'd get #26 (=6 mod 20). The next 13 would find 13 full, 26 full, and so would take 39 (=19 mod 20). Everyone who hashed to 13 would next try moving to the SAME next place.

A better idea is to compute a second hash number which is independent of the first number. Then, even a whole bunch of people map to #13, their next choices would be different distances away.

Havvy correctly figued out that the size of a hash table should be a prime number so that regardless of the size of jumps, an item that keeps running into full slots would eventually visit every possible index.

Hash tables work much much better when there is plenty of room. When they get close to full, they degenerate to performing like a linked list.

If a hash table gets too full, consider allocating a much bigger table and re-hashing (don't you hate overloading of teerms?) every item into the new table.

Consider what to do when an item is removed from the hash table. You cannot just set the spot to "empty". What if some other item hashed to that index: they found it full and so they went bouncing along until they found an open slot. Now what if we looked for that other item, got to this slot and found it empty? We would rightfully conclude that the item we were looking for is not IN the table, for it should be here, but it is not here, so it is new. Instead we have to install a special place-marker. If an item is being inserted, it can go here, but if you are looking for an existing item, better look further as it could have collided with the thing that is now gone.

A well-tuned hash table is O (n). Each search is O (1).

Merge Sort
Classic divide-and-conquer. Split the group about in half. Recursively split groups about in half. When the group is small enough, sort them in a straight-forward way. E.g., when there are two people, compare and possibly swap. Each pass divides groups about in half, which means there are lg n passes required. On the return, join pairs of groups into bigger groups. This operation takes O (n): there are not usually as many compares as there are people. Since there are O (lg n) splits, the merge takes O (n * lg n)

**Visual Sorts**
I (Ryan R.) mentioned a visual aid for several sorting algorithms to a couple of my fellow students today, and wanted to post a [|link] to the web page hosting the videos. The videos show folk dancers that go beyond our practice of handing out numbers and give a good visual on bubble, shell, insert, select, merge*, and quick* sort. And yes, bubble takes forever. *Check the bottom of the linked page for links to these dances that were added later.