CSC 148 H1, Winter 2013

Lab 9 (Mar. 19/20)

This document contains the instructions for lab number 9 in CSC 148 H1. To earn your lab mark, you must actively participate in the lab.
We mark you in order to ensure a serious attempt at learning, NOT to make careful critical judgments on the results of your work.

General rules

We will use the same general rules as for the first lab (including pair programming). See the instructions at the beginning of Lab 1 to refresh your memory.

Overview

This lab has you compare several searching and sorting algorithms. You will use the module cProfile to investigate how the various functions spend their execution time. Some algorithms are so fast that the output of the cProfile module does not provide enough significant digits to tell the time! In those cases, we rely on the timeit module to estimate the total time taken.

Do not be surprised if sometimes the output of the profiler is not what you expect! Think about all the various factors that affect the profiler. Always think about better ways of performing analysis of code performance. Feel free to tweak the code to try to come up with more sensible results.

Warning! In order for the cProfile module to work correctly under Wing, you must execute your code in DEBUG mode — in other words, click on the "Debug" button instead of the "Run" button. This is due to some unknown compatibility issue...


Searching

When you are done, show your work to your TA and switch roles.


Sorting

Selection sort

Selection sort works by repeatedly selecting the smallest remaining item and putting it where it belongs.

When you profile selection sort, you'll discover that a lot of time is spent calling len. Fix this by introducing a temporary variable in the main while loop and re-run the profiling. You'll notice that len is still being called a lot; find out where and use the same trick to avoid calling len so much. How much time did you just save?

Insertion sort

Insertion sort works by repeatedly inserting the next item where it belongs in the sorted items at the front of the list. There are two versions: one manually moves items using a loop, and the other relies on Python's del. Why do you think Python's list code is so much faster? Of selection sort and insertion sort, which is faster? Why do you think this is?

Bubblesort

Bubblesort works by repeatedly scanning the entire list and swapping items that are out of order. One consequence of bubblesort is that, on the first scan, the largest item will end up at the end of the list no matter where that item was before the first scan. Given what we've learned from timing selection and insertion sort, how do you think bubblesort will perform?

There are two versions. The second one has a check to see whether any items have been swapped on the last scan, and if not it stops early (in that case, no items were out of order). How much of a difference does it make to exit early? Is it noticeable? Once you've done the bubblesort timing, figure out which is faster and why.

Mergesort

Mergesort is different: it splits the list in half, sorts the two halves, and then merges the two sorted halves. There are two versions: the first one uses a helper function _mergesort_1 that returns a new sorted list (and thus only replaces the items in the original list once, when the helper function exits), and the second one uses a helper function _mergesort_2 that sorts the list between two indices and continually updates the original list. Which do think is faster, and why?

Quicksort

Quicksort works by partitioning the list into two halves: those items less than the first item, and those greater than the first item. For example, if the list is [5, 1, 7, 3, 9, 12], then the helper function _partition will rearrange the list into this: [3, 1, 5, 9, 12, 7] — notice that the 5 is now in the right place. Then the left and right sections are sorted using quicksort. How fast is this? Is quicksort faster on nearly-sorted lists or on random data? Why?

There are two versions of quicksort. The second one uses indices to sort the list in-place, without making copies of each sublist. How much difference does this make?

list.sort

How much of a difference does it make to use Python's built-in sort? Why do you think it is so much faster?

Nearly-sorted data, reversed data

The results hold for randomized input. Investigate whether there is an input that leads to fast or slow times: try both nearly-sorted input and reverse-sorted input. (See the comments in test_sort.py to find out how to generate these other kinds of inputs.)

When you are done, show your work to your TA. Then, please stick around to help other students in your lab section!