In the last section, the efficient routine for finding data in an array was based on the premise that the data being searched was already sorted. Indeed, computers are used so extensively to process data collections that in many installations, a great deal of their time is spent maintaining that data in sorted order in the first place. It turns out that methods of searching a sorted list have much in common with methods of achieving that sorted condition.
In order to concentrate on sorting abstractions without having to be concerned with the type of data being sorted, most of the code presented in the rest of this chapter will be designed to sort only a single kind of data, namely arrays of cardinals. Only minor modifications in a few places are necessary to use the code presented in the next few sections for other kinds of data. The sorts presented here can be made generic enough to sort any kind of data, but that step requires some programming abstractions that will not be taken until chapter twelve, so the topic will be revisited in chapter thirteen with that aspect in view.
The basic premise behind sorting an array is that its elements start out in some (presumably) random order and need to be arranged from lowest to highest. If the number of items to be sorted is small, a human reader may be able to tell at a glance what the correct order ought to be. If there are a large number of items, a more systematic approach is required. To do this, it is necessary to think about what it means for an array to be sorted. It is easy to see that the list
1, 5, 6, 19, 23, 45, 67, 98, 124, 401
is sorted, whereas the list
4, 1, 90, 34, 100, 45, 23, 82, 11, 0, 600, 345
is not. The property that makes the second one "not sorted" is that there are adjacent elements that are out of order. The first item is greater than the second instead of less, and likewise the third is greater than the fourth and so on.
Once this observation is made, it is not very hard to devise a sort that proceeds by examining adjacent elements to see if they are in order, and swapping them if they are not.
This most elementary sorting method proceeds by scanning through the elements a pair at a time, and swapping any adjacent pairs it finds to be out of order. Thus, for the list in the example above, the first two items are swapped, then the (new) second item is compared to the third (and not swapped,) the third is compared to the fourth, and so on to the end. The list would be altered as follows (comparisons are emphasized.)
4, 1, 90, 34, 100, 45, 23, 82, 11, 0, 600, 345 1, 4, 90, 34, 100, 45, 23, 82, 11, 0, 600, 345 1, 4, 90, 34, 100, 45, 23, 82, 11, 0, 600, 345 1, 4, 34, 90, 100, 45, 23, 82, 11, 0, 600, 345 1, 4, 34, 90, 100, 45, 23, 82, 11, 0, 600, 345 1, 4, 34, 90, 45, 100, 23, 82, 11, 0, 600, 345 1, 4, 34, 90, 45, 23, 100, 82, 11, 0, 600, 345 1, 4, 34, 90, 45, 23, 82, 100, 11, 0, 600, 345 1, 4, 34, 90, 45, 23, 82, 11, 100, 0, 600, 345 1, 4, 34, 90, 45, 23, 82, 11, 0, 100, 600, 345 1, 4, 34, 90, 45, 23, 82, 11, 0, 100, 600, 345 1, 4, 34, 90, 45, 23, 82, 11, 0, 100, 345, 600
Unfortunately, the list is not yet sorted, as there are still several places where adjacent items are out of order. The number 0, for instance, which should be first, is in the ninth slot. Notice, however, that the largest item worked its way to the top position, and indeed, this algorithm will always force this to happen. Thus, if at this point the same strategy is continued, it is only the first n-1 items that need to be scanned. On the second pass, the second largest item will move to its correct position, and on the third pass (stopping at item n-3) the third largest will be in place. It is this gradual percolation, or bubbling of the larger items to the top end that gives this sorting technique its name.
There are two ways in which the sort can terminate with everything in the right order. It could complete by reaching the n-1st pass and placing the second smallest item in its correct position. Alternately, it could find on some earlier pass that nothing needs to be swapped. That is, all adjacent pairs are already in the correct order. In this case, there is no need to go on to subsequent passes, for the sort is complete already. If the list started in sorted order, this would happen on the very first pass. If it started in reverse order, it would not happen until the last one.
The code that follows embodies one way of expressing this algorithm. It requires a separate swap procedure, which is included here for the sake of completeness, and for a reminder of how swapping is done. Here, the type cardinal is used, but other types could also be employed. Note that the array may not be full, and unused items at the end are not to be included in the sort, so a cardinal parameter must be passed to inform the bubble sort how many items are active. This procedure does no error checking on the passed parameter; it is a precondition that the number passed be valid. Also note that the array being sorted is numbered [1 .. n], [0 .. n - 1], or in general [a .. a + (n - 1)] exterior to the bubble sort procedure, but only [0 .. n - 1] inside it. The procedure will therefore be informed where to begin sorting and where to stop , both as an offset from the first index. These will be passed in parameters lBound and uBound respectively, and there will be the least confusion, of course, if the arrays are numbered by the caller as [0 .. last].
PROCEDURE Swap (VAR first, second : CARDINAL); (* Exchange values in first and second. *) VAR temp : CARDINAL; BEGIN temp := first; first := second; second := temp; END Swap; PROCEDURE BubbleSort (VAR source : ARRAY OF CARDINAL; lBound, uBound : CARDINAL); (* sorts beginning at item # lBound and ending at item #uBound, as numbered inside the procedure [0..HIGH], not as on the outside. *) (* Pre: uBound <= HIGH (source) Post : source is bubble sorted *) VAR switch, count : CARDINAL; BEGIN IF uBound > lBound (* otherwise, there is nothing to do *) THEN REPEAT (* start a new pass here *) switch := 0; (* reset number of swaps for this pass *) FOR count := lBound TO (uBound - 1) (* scan only as far as necessary *) DO IF source [count] > source [count + 1] (* compare adjacent pairs *) THEN Swap (source [count], source [count + 1]); INC (switch); END; (* if *) END; (* for *) DEC (uBound); (* for the next pass, examine one less item *) UNTIL (switch = 0) OR (uBound = 0); (* till no pairs swapped, or at first *) END; (* if *) END BubbleSort;
When this sort was tested, statements were included to print the array after each comparison, and to print the number of swaps on each pass. The original data is given first, and the items that are compared are emphasized.
234 77 0 113 404 94 900 113 15 300 13 135 Pass number 1 77 234 0 113 404 94 900 113 15 300 13 135 77 0 234 113 404 94 900 113 15 300 13 135 77 0 113 234 404 94 900 113 15 300 13 135 77 0 113 234 404 94 900 113 15 300 13 135 77 0 113 234 94 404 900 113 15 300 13 135 77 0 113 234 94 404 900 113 15 300 13 135 77 0 113 234 94 404 113 900 15 300 13 135 77 0 113 234 94 404 113 15 900 300 13 135 77 0 113 234 94 404 113 15 300 900 13 135 77 0 113 234 94 404 113 15 300 13 900 135 77 0 113 234 94 404 113 15 300 13 135 900 Swaps on this pass = 9 Pass number 2 0 77 113 234 94 404 113 15 300 13 135 900 0 77 113 234 94 404 113 15 300 13 135 900 0 77 113 234 94 404 113 15 300 13 135 900 0 77 113 94 234 404 113 15 300 13 135 900 0 77 113 94 234 404 113 15 300 13 135 900 0 77 113 94 234 113 404 15 300 13 135 900 0 77 113 94 234 113 15 404 300 13 135 900 0 77 113 94 234 113 15 300 404 13 135 900 0 77 113 94 234 113 15 300 13 404 135 900 0 77 113 94 234 113 15 300 13 135 404 900 Swaps on this pass = 7 Pass number 3 0 77 113 94 234 113 15 300 13 135 404 900 0 77 113 94 234 113 15 300 13 135 404 900 0 77 94 113 234 113 15 300 13 135 404 900 0 77 94 113 234 113 15 300 13 135 404 900 0 77 94 113 113 234 15 300 13 135 404 900 0 77 94 113 113 15 234 300 13 135 404 900 0 77 94 113 113 15 234 300 13 135 404 900 0 77 94 113 113 15 234 13 300 135 404 900 0 77 94 113 113 15 234 13 135 300 404 900 Swaps on this pass = 5 Pass number 4 0 77 94 113 113 15 234 13 135 300 404 900 0 77 94 113 113 15 234 13 135 300 404 900 0 77 94 113 113 15 234 13 135 300 404 900 0 77 94 113 113 15 234 13 135 300 404 900 0 77 94 113 15 113 234 13 135 300 404 900 0 77 94 113 15 113 234 13 135 300 404 900 0 77 94 113 15 113 13 234 135 300 404 900 0 77 94 113 15 113 13 135 234 300 404 900 Swaps on this pass = 3 Pass number 5 0 77 94 113 15 113 13 135 234 300 404 900 0 77 94 113 15 113 13 135 234 300 404 900 0 77 94 113 15 113 13 135 234 300 404 900 0 77 94 15 113 113 13 135 234 300 404 900 0 77 94 15 113 113 13 135 234 300 404 900 0 77 94 15 113 13 113 135 234 300 404 900 0 77 94 15 113 13 113 135 234 300 404 900 Swaps on this pass = 2 Pass number 6 0 77 94 15 113 13 113 135 234 300 404 900 0 77 94 15 113 13 113 135 234 300 404 900 0 77 15 94 113 13 113 135 234 300 404 900 0 77 15 94 113 13 113 135 234 300 404 900 0 77 15 94 13 113 113 135 234 300 404 900 0 77 15 94 13 113 113 135 234 300 404 900 Swaps on this pass = 2 Pass number 7 0 77 15 94 13 113 113 135 234 300 404 900 0 15 77 94 13 113 113 135 234 300 404 900 0 15 77 94 13 113 113 135 234 300 404 900 0 15 77 13 94 113 113 135 234 300 404 900 0 15 77 13 94 113 113 135 234 300 404 900 Swaps on this pass = 2 Pass number 8 0 15 77 13 94 113 113 135 234 300 404 900 0 15 77 13 94 113 113 135 234 300 404 900 0 15 13 77 94 113 113 135 234 300 404 900 0 15 13 77 94 113 113 135 234 300 404 900 Swaps on this pass = 1 Pass number 9 0 15 13 77 94 113 113 135 234 300 404 900 0 13 15 77 94 113 113 135 234 300 404 900 0 13 15 77 94 113 113 135 234 300 404 900 Swaps on this pass = 1 Pass number 10 0 13 15 77 94 113 113 135 234 300 404 900 0 13 15 77 94 113 113 135 234 300 404 900 Swaps on this pass = 0
It is not difficult to see that some additional efficiency can be obtained for the bubble sort. It uses many swaps to get the largest item into its correct position on each pass, and some of these are wasted. If the scan is modified so that it simply finds the largest item in the range being scanned and no interchanges are done until the scan is finished, all the intermediate swaps can be eliminated. Then, when the pass is complete, the largest item can be swapped into the top position for that pass. For instance, starting with the list
4, 1, 90, 34, 100, 45, 23, 82, 11, 0, 600, 345
one would find the number 600 to be largest on the first pass (as with the bubble sort,) but do only the swap from its present position to the last spot for the pass. Successive passes would produce the lists:
4, 1, 90, 34, 100, 45, 23, 82, 11, 0, 345, 600 4, 1, 90, 34, 100, 45, 23, 82, 11, 0, 345, 600 4, 1, 90, 34, 0, 45, 23, 82, 11, 100, 345, 600 4, 1, 11, 34, 0, 45, 23, 82, 90, 100, 345, 600 4, 1, 11, 34, 0, 45, 23, 82, 90, 100, 345, 600 4, 1, 11, 34, 0, 23, 45, 82, 90, 100, 345, 600 4, 1, 11, 23, 0, 34, 45, 82, 90, 100, 345, 600 4, 1, 11, 0, 23, 34, 45, 82, 90, 100, 345, 600 4, 1, 0, 11, 23, 34, 45, 82, 90, 100, 345, 600 0, 1, 4, 11, 23, 34, 45, 82, 90, 100, 345, 600 0, 1, 4, 11, 23, 34, 45, 82, 90, 100, 345, 600
In each pass, the number successfully placed is underlined. Here is the code to express these ideas:
PROCEDURE SelectSort (VAR source : ARRAY OF CARDINAL; lBound, uBound : CARDINAL); (* Pre: uBound <= HIGH (source) and uBound > 0 Post : source is selection sorted *) (* sorts beginning at item # lBound and ending at item #uBound, as numbered inside the procedure [0..HIGH], not as on the outside. *) VAR count, iLarge : CARDINAL; BEGIN IF uBound > lBound (* otherwise, there is nothing to do *) THEN REPEAT (* start a new pass here *) iLarge := uBound; (* assume last index is of the largest one *) FOR count := uBound TO lBound BY -1 (* scan from current end of list *) DO IF source [count] > source [iLarge] (* look for anything larger *) THEN iLarge := count; END; (* if *) END; (* for *) IF iLarge # uBound (* if equal, its already in right spot *) THEN Swap (source [uBound], source [iLarge]); END; (* if *) DEC (uBound); (* for next pass examine one less item *) UNTIL uBound = lBound; END; (* if *) END SelectSort;
Notice that a small change was made in the structure of the sort. The scan for the largest on each pass is conducted starting at the last index rather than at the low end. This is done to maximize the efficiency of the sort in cases where the list is already sorted. The index iLarge would in such cases hold the position of the largest item right from the start of the scan; no changes of index would take place, and there would be no swap. Of course, if the list were reverse sorted to begin with, this choice would reduce efficiency, but that possibility is less likely than that it may be sorted prior to beginning.
This procedure was run with the same test data as the bubble sort and with suitable print statements to monitor the progress of the sort. It produced the output below, with the original data first. The item correctly placed on each pass is emphasized.
234 77 0 113 404 94 900 113 15 300 13 135 Pass number 1 234 77 0 113 404 94 135 113 15 300 13 900 Pass number 2 234 77 0 113 13 94 135 113 15 300 404 900 Pass number 3 234 77 0 113 13 94 135 113 15 300 404 900 Pass number 4 15 77 0 113 13 94 135 113 234 300 404 900 Pass number 5 15 77 0 113 13 94 113 135 234 300 404 900 Pass number 6 15 77 0 113 13 94 113 135 234 300 404 900 Pass number 7 15 77 0 94 13 113 113 135 234 300 404 900 Pass number 8 15 77 0 13 94 113 113 135 234 300 404 900 Pass number 9 15 13 0 77 94 113 113 135 234 300 404 900 Pass number 10 0 13 15 77 94 113 113 135 234 300 404 900 Pass number 11 0 13 15 77 94 113 113 135 234 300 404 900