2D1320, Exam in Applied Computer Science (TilDa) Saturday March 7, 1998 8 am 1 pm Maximum points exam + bonus = 50 + 5. Grading: 25 points gives a grade of 3, 35 gives 4, and 45 gives 5. The results will be posted no later than March 21 on Nada's bulletin board. Help Tools: An algorithm book. 1. Knuth automaton Datalogy's father, Donald Knuth, is 11011 years old (59 that is, but he prefers to use binary numbers). To keep his masterful piece of art The art of computer programming up to date, on his birthday he searched for all occurrences of 111010 (58) and change them to his new age. How would his search automaton for 111010 look like? Sketch it with solid forward arrows and dashed backward arrows and give the next vector which defines the automaton! Which search method would probably be the fastest one? Describe it very briefly for the above case! 2. Order in the queue Person records have been entered in a queue, ordered by the last name. Persons with the same last name have been sub-ordered by increasing personal numbers, but that was a mistake. Instead, the sub-order should have been by increasing age. Describe in detail an algorithm to rearrange the queue in this way. For your help, you have got a stack. Both the stack and the queue are abstract. 3. Recursively one-legging A binary tree can have some records that are not leaves but they have either the left or the right pointer set to NIL. After the statement n := Onelegs(p), n should be the number of one-legged nodes in the tree which p is pointing at. Describe in words or in any programming language a recursive idea for Onelegs. 4. Hash molecules A database of approximately 10,000 molecules should be entered into a hash vector to achieve superfast searching on the molecular formula. But how will the hash function look like? Many formulas can be written in several ways and therefore, you only want the number of atoms of each type, not the order, to influence the hash value. Suggest such a hash function and a suitable size for the hash vector. If each record is 200 bytes, namely 60 bytes for the formula, 40 bytes for the name, and 100 bytes for numerical data, how much memory is required with your suggestion? A pointer takes 4 bytes. 5. Bucket problem At some point you are standing at a lake with 3 buckets with 15, 21, and 35 liters but you want to measure 17 liters with as few pourings as possible. Then you wish you had a program that solves this and similar problems. Below, we can see an example of how it would behave for a simpler example. Study the example in detail and then give an algorithm that solves the problem that gives as few pourings as possible. How many buckets? 2 How much do they hold? 2 5 How many liters do you want? 1 4 pourings are needed: 0 0 0 5 2 3 0 3 2 1 How many buckets? 2 How much do they hold? 2 4 How many liters do you want? 1 Solution does not exist! You do not have to write program code but you should be able to explain the algorithm in detail and describe the data structures and the division into modules. 6. Syntax for suspicious people DU ANAR INTE ATT JAG LÅTSAS ATT JAG INTE ANAR ATT DU LÅTSAS (You do not suspect that I pretend that I do not suspect that you pretend) Write a context-free grammar for sentences of this type. Use, for instance, the symbols , , , . Only words from the above example should occur but even simpler sentences like DU LÅTSAS should follow the syntax. Observe that INTE is placed different in the main clause and the to clause. Describe in words how a program can examine if sentences are following your syntax. What procedures are needed? 7. The seekers Felipe and Viggo both have their own database of the course students. Felipe is using a balanced binary tree and Viggo has a hash vector with separate chaining. Since the program was written for a compiler course with 20 students, the hash vector only has the size of 29. For this course, Viggo's searches only took ¼ of the time of Felipe's search. Why? This year's Tilda course has 300 students. Now, suddenly, Viggo's searches take a longer time than Felipe's. Explain the phenomena. 8. Abstract buckets How are you measuring 17 L of water from a lake that can hold 15, 21, and 35 liters? That tricky problem you have probably already solved. But, a question in the context that you probably have not already thought of is that the program probably needs an abstract bucket data type. How would an abstract bucket data type look like and how can it be implemented? ---------- 2D1320, Applied Computer Science (TilDa) Exam Solutions Saturday March 7, 1998 8 am 1 pm 1. Knuth automaton i next[i] 1 0 2 0 3 0 4 3 5 0 6 2 Boyer-Moore in comparison, examines every 6th character. As long as it is not a 1 or a 0, then you continue. If it is a 1 or a 0, you are examining the surroundings carefully before continuing. 2. Order in the queue · Put a dumb record with the name Pnxyr in the end of the queue. · Take a record from the queue, examine the name and push it on to the stack. Then repeat: · Take a record from the queue. · If there are different names, you are popping the whole stack into the queue. · If it is the dumb record, the loop is interrupted otherwise push the record onto the stack. 3. Recursively one-legging IF p=NIL THEN RETURN 0 ELSIF p.left=NIL and p.right=NIL THEN RETURN 0 ELSIF p.left=NIL THEN RETURN 1+Onelegs(p.right) ELSIF p.right=NIL THEN RETURN 1+Onelegs(p.left) ELSE RETURN Onelegs(p.left) + Onelegs(p.right) END; 4. Hash molecules Let p be a prime number close to 15,000 and let ai be the number of atoms with the atom number i, where 1 i 105. A suitable hash function is given by: h := 13827 FOR i := 1 TO 105 DO h := (128 * h + a[i]) MOD p; END; The hash vector takes 15000*4 bytes, the hash record 10000*44 bytes (name and data pointer), data record 10000*200 bytes. A total of 2.5 megabytes. 5. Bucket problem Breadth-first search with a queue is best. A position is a vector with bucket contents and a parent pointer. A child is created by pouring from bucket i to bucket j. To prevent dumb children, you can code each solution position as a whole number, for instance, with 262144h1+4096h2+64h3+h4, where hi is the contents in bucket i. For each position you are creating, you store the code in a dumb children binary tree or a dumb children hash vector. When a solution occurs, you print the chain recursively to ensure the correct order. The modules queue and bintree are needed and in the main module, you can find the procedures MakeChildren, WriteChain, and a record type including a bucket vector and a parent pointer. 6. Syntax for suspicious people ::= | INTE ::= JAG | DU ::= ANAR | LÅTSAS ::= | ATT ::= ATT INTE The procedure ReadSentence calls ReadSubject, ReadVerb, peeking and possibly eats up INTE and then calls and so on according to the syntax. 7. The seekers Searching in a hash vector with 50% empty space demands, on average, a little more than one comparison. In a binary tree with 20 records, one search requires about 5 comparisons. That is why Viggo's search took about ¼ of Felipe's search time. With 300 students, the collision lists will be a length of a little longer than 10 and a failed search will take about 10 comparisons. In a binary tree with 300 records, a failed search only takes 8 comparisons. 8. Abstract buckets An abstract Bucket.T can be an object type with the following interface: INTERFACE Bucket (* The main program can have *) PROCEDURE Make(size:INTEGER):T; (* VAR h1:=Bucket.Make(17); *) TYPE T<:AbstractBucket; AbstractBucket = OBJECT (* Implementation of T *) METHODS (* here are, of course, the fields *) contents(): INTEGER; (* size: INTEGER and *) spaceLeft():INTEGER; (* content: INTEGER := 0 *) empty():BOOLEAN; full():BOOLEAN; add(amount:INTEGER); (* IF h1.spaceLeft() VÅRDA -> VÅRD -> VÅR -> ÅR -> Å You will now sketch in a computer program that looks for the longest word in SAOL with this characteristic. a) You don't have to write a program code but you should describe the algorithm in detail and you should describe the data structure procedures and the division into modules. The module Stava, which you got from Viggo, includes a procedure SAOL(word: TEXT): BOOLEAN that decides if a word is in the word list. Your program should use the procedure, not the word list file. b) Viggo's procedure does not look in any world list, instead it calculates 14 different hash indices and looks in a boolean vector if they are all TRUE. The vector has the length size=3999971. How do you think the hash functions look like, in principle? 5. Diamond sorting A database of the worlds' greatest diamonds (about 100,000 different) includes, among other things, the weight in carets, and the latest selling price in dollars. Now the records are sorted by the dollar price, but since the British crown jewels rarely change owners, their selling price are not completely relevant. As a diamond datalogist, you have the assignment to resort the records by the weight and done as fast as possible. What sorting method are you suggesting^Å a) ^Å if the computer's RAM memory can hold a vector with all of the records? b) ^Å if the computer is an ABC80 which can hold 100 records in RAM and is missing a disk drive but has two cassette recorders? 6. Chemical reaction syntax A syntax for molecular formulas has occurred in this course. Now we are extending it to a syntax for chemical reaction formulas of the type: Cu + 2 HCl -> CuCl2 + H2 Write a BNF syntax which defines the symbols , , , and which uses our earlier known symbols and (a whole number greater than 1). Describe briefly how your program for molecular formula check can be extended to check that a whole chemical reaction formula is following the syntax.. 7. Cheap standard selection Tilda and Totte each wrote one sorting procedure. Tilda chose an excellent merge sort while Totte selected a standard selection sort. When they made a test run with a thousand records, Totte's program was as fast as Tilda's since he had a super computer but with 10,000 records, Tilda won. With how much? 8. Abstract personal number What data type is a personal number? Sometimes it thought of as TEXT, sometimes as an INTEGER, sometimes as an ARRAY[1..10] OF INTEGER. Explain why none of these are ideal. Explain the benefits of an abstract data type instead, name some important functions and describe how they can be implements in their own module in a programming language like Modula-3. ---------- 2D1320, Applied Computer Science (TilDa) Exam Solutions Saturday March 9, 1998 1400-1900 1. Personal number automaton next[1..11] = 0 1 1 0 1 3 2 0 1 1 4 2. Save the rainforest! Use pre-order. 3. Balance art PROCEDURE ReadTree(n: INTEGER): Pointer RAISES {} = VAR m := n DIV 2; p := NEW(Pointer); (* create an empty record for the root *) BEGIN IF n=0 THEN RETURN 0 END; (* build the left tree from the first half of the file *) p.left := ReadTree(m); p.word := IO.GetLine(); (* put the middle word into the root record *) (* build the right tree from the other half of the file *) p.right := ReadTree(n-m-1); RETURN p; END ReadTree; 4. Longer spring days The problem tree consists of word records including parent pointers. The empty word is the first ancestor. You do a breadth-first search with the help of a queue, until the queue is empty. The last word then gives the solution with the help of the parent pointer chain. The children are created by adding one letter to the beginning or end of the word. All hash functions are calculated in a loop of the type: h := h * 128 + the words' jth letter) MOD prime with different primes for different functions. It is suitable to enter the loop with a great h value, otherwise short words will get 14 similar hash values. 5. Diamond sorting Since the order is not going to change that much, the insertion sort is best. But if you can only store 100 records at a time, it is best to use a heap. One traversal is then enough, assuming that no record is more than 100 positions wrong. 6. Chemical reaction syntax ::= | ::= | + ::= -> ReadReaction() calls ReadSum(), reads an arrow and calls ReadSum() again. ReadSum() calls ReadTerm(), checks if the next character is a plus, if so, reads it and calls itself. ReadTerm() checks if the next character is a digit and if so, calls ReadNum(), and whether it is a number or not, ReadTerm() calls ReadMol(). 7. Cheap standard selection When N is growing from 1000 to 10,000, N2/Nlog2N approximately grows by a factor of 7. That is, Tilda's sorting took 1/7th the time of Totte's. 8. Abstract personal number An INTEGER can not be of any size, a ten digit personal number can be too large. An ARRAY[1..10] OF INTEGER takes up a lot of extra memory space (40 bytes). Using a TEXT makes it inconvenient to extract, for instance, the year. If you are using an abstract data type, you can, at a later point, choose the best implementation by only changing the module pnr.m3. It becomes even more abstract with an object: INTERFACE Pnr; (* Pnr.i3 *) EXPECTION WrongCheckdigit; TYPE T <: AbstractPnr; AbstractPnr = OBJECT METHODS male(): BOOLEAN; female(): BOOLEAN; year(): INTEGER; month(): INTEGER; day(): INTEGER; equal(pnr:T): BOOLEAN; totext(): TEXT; fromtext(word: TEXT) RAISES {WrongCheckdigit}; END; END Pnr. ---------- Fake exam in Applied Computer Science, 2D1320 Help tools: Sedgewick or another algorithm book. Grade limits: 25-45 gives a 3, 35-44 gives a 4, 45-55 gives a 5. There are a total of 50 points in the exam and 5 bonus points. 1. Sort a queue (5 pts) In an abstract queue, the personal records are randomly ordered. Describe in words or in programming code how you, from the main program, can order the queue after the field pnr (personal number). Last in the queue exists a made-up record with the pnr = 999999999 (the tenth digit in the personal number is not used). 2. Water guesses (6 pts) There is an ongoing guessing competition about how many millimeters of water will fill the water festivals' rain counter 1996. The guess (a floating number) and the guesser's name and address are written into a giant file. When the correct answer is known, you must traverse all of the hundreds of thousands of guesses and find the thousand best guesses. They are all winning an extremely cool festival umbrella. Describe the best algorithm and tell what data structures that can be of use. 3. New personal number law (8 pts) New rules for the use of personal numbers makes it necessary to remove the last four digits from the personal number field in many registers. As a sorting and searching key, you will primarily use the birth date and within the same birth date, sort them in alphabetical order by their name. Current registers that are sorted by personal number must be resorted to, for instance, make 6705091152 Eriksson, Kimmo end up before 6705091053 Eriksson, Laban. Suggest a suitable sorting method if the register is an array with about 10,000 records of 100 bytes each. Suggest a suitable method if the register is a binary tree sorted according to the old method and a new binary tree should be created with the new order. 4. The fikon/fig language (9 pts) A BNF grammar for the notation includes the syntax rule ::= . Your task is to write syntax rules for and . The goal is to make the firstpart include all characters up to and including the first vowel in a word. You can assume that the notation's , , and are defined. Then describe in words how the grammatics can be used to write a compiler that translates ordinary text like Talar du detta språk to Filartakon fidukon fittadekon fikspråkon? 5. Abstract money (6 pts) The one who programs economical applications ends up having a problem: what data type is money? It is not satisfying with a integer when you are supposed to calculate 7.15 (the price for a school lunch) and not with a real when you only have 12 digit numbers like 67503824199 (the critical losses for Nordbanken). The best is to use an abstract data type, kronor. Define such a type by suggesting some important procedures. Also suggest an implementation and how to divide it into modules. 6. Synthesizing (9 pts) In a processing chemical database there exists records looking like the following: C6H10O5 -> HNOC17H32O16(COOH)2 0.93 ^Åprocess description^Å The process synthesizes the second substance from the first and the exchange (efficiency) is 93%. Invent an algorithm which searches for the most efficient process chain to synthesize a given substance from another given substance. Also give the data structures and suitable module divisions. 7. Hashing of substances (7 pts) For extremely fast search in the above mentioned database with about 10,000 records, you want to use hashing with the first formula as the key (that is C6H10O5 in the example above). Suggest a hash function and the size of the hash vector. ---------- Fake exam in Applied Computer Science, 2D1320 Solutions 1. Sort a queue (5 pts) Bubble sort until no change is made! It can be done with queue calls. REPEAT changed := FALSE; (* Flag that shows is something has changed *) x := Queue.Get(); y := Queue.Get(); (* The first couple out *) REPEAT IF n.pnr < y.pnr THEN Queue.Put(x); x := y; ELSE Queue.Put(y); changed := TRUE; END; y := Queue.Get(); UNTIL y.pnr=999999999; Queue.Put(x); Queue.Put(y); (* The last couple in *) UNTIL NOT changed; 2. Water guesses (6 pts) Use a heap with room for 1000 records. Read a thousand records from the file and sort the worst guesses on top of better ones. At the top of the heap now, the worst of these thousand guesses is placed and those are the ones that you are comparing to to probably exchange with the next record from the file. The algorithm can be given as: REPEAT Read.Guess from the file IF the guess is better than heap[1] THEN heap[1] = guess Downheap(1) (* fix the heap *) END UNTIL the file ends 3. New personal number law (8 pts) A suitable sorting method for an array is insertion sort. Since the vector is almost sorted already, the method becomes very fast. There probably does not exist a clever way to use the old binary tree to take advantage of the fact that the old binary tree is almost sorted. You will have to sort in the old records, one after one, into the new tree, but you are not allowed to traverse the old tree in-order since the new tree becomes catastrophically unbalanced. Pre- or post-order works fine. 4. The fikon/fig language (9 pts) ::= ::= ::= ::= ::= A program that reads the text according to the grammar and compiles it into the fikon language can look like this: MODULE Main; IMPORT IO, Mio, Fmt; EXCEPTION Error; CONST vowel= SET OF CHAR{'a', 'e', 'i', 'o', 'u', 'y', 'å', 'ä', 'ö'}; delimiters = SET OF CHAR{' ', '.', ';', '!', '?', ',', ':', '\n'); PROCEDURE ReadFirstPart(): TEXT RAISES {Error} = BEGIN IF Mio.NextChar() IN vowels THEN RETURN Fmt.Char(IO.GetChar()); ELSIF Mio.NextChar() IN delimiters THEN RAISE Error; ELSE RETURN Fmt.Char(IO.GetChar()) & ReadFirstPart() END; END ReadFirstPart; PROCEDURE ReadSecondPart(): TEXT = BEGIN IF Mio.NextChar() IN delimiters THEN RETURN ""; ELSE RETURN Fmt.Char(IO.GetChar()) & ReadSecondPart() END; END ReadSecondPart; PROCEDURE ReadWord(): TEXT = VAR firstpart: TEXT; BEGIN TRY firstpart := ReadFirstPart(); RETURN "fi" & ReadSecondPart() & firstpart & "kon"; EXCEPT | Error => RETURN "***"; END; END ReadWord; PROCEDURE ReadText(): TEXT = BEGIN IF Mio.NextChar() IN delimiters THEN Mio.PutChar(IO.GetChar()); ELSE IO.Put(ReadWord()) END; IF IO.EOF() THEN RETURN ELSE ReadText() END; END ReadText; BEGIN IO.Put("Translation to the fikon language: "); ReadText(); END Main. 5. Abstract money (6 pts) The data type should be able to have 12 digits and 2 decimal places. Important procedures are WriteMoney(x), ReadMoney(x), AddMoney(x, y), and so on. It can be implemented as: TYPE Money = RECORD million: INTEGER; ore: INTEGER; (* maximum 8 digits *) END and the above data type can be placed in Money.i3 while the procedures are implemented in Money.m3. It becomes the most abstract if you use an opaque object type. TYPE T<: AbstractMoney; AbstractMoney = OBJECT METHODS write(); read(); add(money:T):T; END; 6. Synthesizing (9 pts) The database reads records of the type NEW(Process) where: TYPE Process = REF RECORD substance1: TEXT; substance2: TEXT; exchange: REAL; END; and is hashed with substance1 as the key. The problem tree is built from the data type: TYPE Position = REF RECORD substance: TEXT; mass: REAL; parent: Position; END; and the procedure MakeChildren(pos: Positions) is searching for pos.substance in the hash vector and create all the children with the mass exchange * pos.mass. The children are entered into a heap (or priority queue) where the top position has the greatest mass and will "give birth" first. You will end up with a best-first-traversal of the problem tree. If several positions have the same substance, only the one with the greatest mass is of interest. The others are dumb children and can be thrown away. To achieve this, the best positions are stored in to another hash vector. Note especially the record for the substance you are trying to synthesize when the top record in the heap has a lower mass than that substance, the algorithm is done. All the steps can then be recreated by using the parent pointers. 7. Hashing of substances (7 pts) The hash vector should have about a 50% error (or be 50% empty). Thus, a suitable size can be the prime number 14999. A good hash function is calculated in a loop with: h := 129 * h + ORD(kth character in the chemical formula) MOD 14999.