Avalg homework 2 fall 2000
This homework is due Monday December 4 at 15.00.
It can be delivered to Stefan personally at his office or put in
his mail slot at the department. It can also be given to Stefan in
connection with one of the lectures. Solutions handed in late
are not accepted and will not be graded.
Some forms of collaboration are allowed. The size of a group
of collaboration is limited to three. I do not consider the set of problems below easy. Thus a performance of getting half the total score on this set is at least equivalent to the grade 4 on this subset of the course. Credit may be given for partially solved problems.
- Build a suffix array. (40p)
- Find the number of occurrences of a particular string. (10p)
- Find the length of the longest repeated substring. (20p)
- Find the number of occurrences of a string which may contain one or more *-symbols. A * matches any string. (20p)
You should give a clear description of the algorithms and an estimate of
the asymptotic worst-case performance for each of the four subtasks.
This is a programming exercise and the same quality standards as in Homework 1 apply. You may assume that the text is stored on file and consists of ASCII characters (excluding 0 and '*'). The program should contain a simple user interface for the queries and the time to execute each command should be reported to the user. The text contains at most 1,000,000 characters. Your goal is to build the suffix array within one minute and the searches should take no more than 10 seconds. You do not know what the text files and queries will look like.
- If there are many elements - e.g. if
*n*is close to 2^*w*, where*w*is the word length - you may sort in linear time. Give an algorithm and discuss how large*n*needs to be. (20p) - If the elements are small you may also sort in linear time. Give an algorithm and discuss how small the elements need to be. (20p)
- You may sort faster if there are many repeated elements. How many distinct elements can you handle and how fast can you sort? (20p)
- The algorithm discussed in class uses large amounts of memory. How much? By doing the radix sorting phase in more than two steps you may reduce the memory requirements while increasing the running time. Explain how to do this and give formulas describing the time-space tradeoff. (20p)
Stefan Nilsson 2000-11-24 |