This homework is due December 10 at 15.15. It can be
delivered to Stefan or Jonas personally at their offices or put in
their mail slot at the department. It can also be given to Stefan in
connection with one of the lectures. Solutions handed in late
are not accepted and will not be graded.
Some forms of collaboration are allowed and required. The size of a
group of collaboration should be two people.
The group should hand in
only one solution and for each problem it should be clearly
marked which of the members have contributed.
The homework will be returned on Thurday and Friday December 13-14.
Please book a time by writing your names on the list
outside of Stefan's office.
All members of the group must be present when the homework is returned.
I do not consider the set of problems below easy. Thus a
performance of getting half the total score on this set
is at least equivalent to the grade 4 on this subset of
the course. Credit may be given for partially solved
1. Suffix array implementation (100p)
A suffix array is a simple data structure that facilitates searching
in a text. It consists of a sorted array of pointers.
There is one pointer for each position in the text. The pointer array
is sorted according to the suffixes to which the pointers refer.
For example, the text "abba0" (0 is a special end of string character) has
the suffix array [3, 0, 2, 1], where the pointers refer
to the suffixes "a0", abba0", "ba0", and "bba0", respectively.
You should give a clear description of the algorithms and an estimate of
the asymptotic worst-case performance for each of the four subtasks.
- Build a suffix array for a text t. (40p)
- Find the number of substrings of t that match a given
- Find the length of the longest repeated substring in t. (20p)
- Find the number of substrings of t that match a string which
may contain one or more '*'-symbols. A '*' matches any string. (30p)
This is a programming exercise and the same quality standards
as in Homework 1 apply. You may assume that the text is stored on
file and consists of ASCII characters (excluding '0' and '*'). The program
should contain a simple user interface for the queries and the time
to execute each command should be reported to the user.
The text contains at most 1,000,000 characters. Your goal is to build
the suffix array within one minute and the searches should take no more
than 10 seconds. You do not know what the text files and queries will
2. Fast sorting (100p)
In the lectures you've seen how to sort n word-sized integers
on a unit-cost RAM in O(n log log n) time.
In this exercise you will study some special cases where it's possible
to find easier algorithms or better time bounds.
- If there are many elements - e.g. if n is close to 2^w,
where w is the word length - you may sort in linear time.
Give an algorithm and discuss how large n needs to be. (25p)
- If the elements are small you may also sort in linear time.
Give an algorithm and discuss how small the elements need to be. (25p)
- You may sort faster if there are many repeated elements. How many
distinct elements can you handle and how fast can you sort? (25p)
- The algorithm discussed in class uses large amounts of memory.
How much? By doing the radix sorting phase in more than two steps
you may reduce the memory requirements while increasing the running time.
Explain how to do this and give formulas describing the time-space