Computer Science CSC263H
October 9, 2024
Homework Assignment #4
Due: October 23, 2024, by 11:00 am
• You must submit your assignment through the Crowdmark system. You will receive by email an invitation through which you can submit your work. If you haven’t used Crowdmark before, give yourself plenty of time to figure it out!
• You must submit a separate PDF document with for each question of the assignment.
• To work with one or two partners, you and your partner(s) must form a group on Crowdmark (one submission only per group). We allow groups of up to three students. Submissions by groups of more than three students will not be graded.
• The PDF file that you submit for each question must be typeset (not handwritten) and clearly legible. To this end, we encourage you to learn and use the LATEX typesetting system, which is designed to produce high-quality documents that contain mathematical notation. You can use other typesetting systems if you prefer, but handwritten documents are not accepted.
• If this assignment is submitted by a group of two or three students, for each assignment question the PDF file that you submit should contain:
1. The name(s) of the student(s) who wrote the solution to this question, and
2. The name(s) of the student(s) who read this solution to verify its clarity and correctness.
• By virtue of submitting this assignment you (and your partners, if you have any) acknowledge that you are aware of the homework collaboration policy for this course, as stated here .
• For any question, you may use data structures and algorithms previously described in class, or in prerequisites of this course, without describing them. You may also use any result that we covered in class (in lectures or tutorials) by referring to it.
• Unless we explicitly state otherwise, you should justify your answers. Your paper will be marked based on the correctness and efficiency of your answers, and the clarity, precision, and conciseness of your presentation.
• The total length of your pdf submission should be no more than 4 pages long in a 11pt font.
|
Question 1. (20 marks)
The following randomized algorithm takes as input an array A[1 .. n] of n distinct integers, and an integer x, and tries to find an index i, 1 ≤ i ≤ n, such that A[i] = x. It uses a “random coin c” that, when flipped, gives “Head” with probability p (a fixed real number in [0, 1]).
Random-Search(A,x)
1 while true
2 pick i uniformly at random from {1,... n}
3 if A[i] = x
4 return i
5 else
6 flip coin c (* this gives Head with probability p *)
7 if get Head
8 then return “Not Found”
In the following questions, you must justify your answers.
a. Suppose that we execute this procedure with A and x such that no element of A equals x:
1. What is the probability that exactly k iterations of the while loop are executed?
2. What is the expected number of iterations made by the while loop?
b. Now suppose that we execute this procedure with A and x such that exactly one element of A equals x. What is the probability that exactly k iterations of the while loop are executed?
Question 2. (20 marks)
Let c1 , c2 , . . . , cn be coins that, when flipped, give “Head” with probability p1 , p2 , . . . , pn, respectively. Assume that p1 = 1 (so flipping c1 always yields “Head”) while p2 , p3 , . . . , pn are real numbers in [0, 1].
Consider the following randomized algorithm:
1 for k = 1 to n
2 flip coin ck (* this gives Head with probability pk *)
3 if get Head
4 then x := k
5 return x
a. Let x be the value returned by the above algorithm; clearly 1 ≤ x ≤ n. For each 1 ≤ i ≤ n, determine the probability Pr[x = i] as a function of the given probabilities p2 , p3 , . . . , pn. Explain your computation in a precise, mathematical way.
What is Pr[x = i] if p2 = p3 = ... = pn = 1/2?
b. We want that the value x returned by the above randomized algorithm to be such that Pr[x = i] = 1/n for every 1 ≤ i ≤ n, i.e., we want x to be a uniform. sample from 1, . . . ,n.
Determine the values for p2 , p3 , . . . , pn that achieve this goal, and justify your answer. In other words, prove that with the values of p2 , p3 , . . . , pn that you give, the probability of x = i is indeed 1/n (for each 1 ≤ i ≤ n), as wanted.
Question 3. (20 marks)
A large number n of dinosaur bones are discovered at an archeological site, and each bone is labeled by a distinct integer in {1, 2,..., n}. Researchers examine some pairs of these bones to try to determine the number of different species of dinosaurs the bones come from. From this examination, they create a list L that contain a total of m items: each item in L is of the form. S(i,j), which means the bone labelled i and the bone labelled j are from the same specie, or of the form. D(k, l), which means that the bone labelled k and the bone labelled l are from different species. But, unfortunately, researchers may make mistakes: for example, they could create a list L = ..., S(3, 2),..., D(5, 2),..., S(5, 3), . . . which must be incorrect (do you see why?).
Design an efficient algorithm for the following task:
• The algorithm’s input is a list L of m items as described above, where m ≥ n. (Recall that n is the total number of bones that were discovered.)
• If the list L is incorrect (as explained above)
then the algorithm outputs Error Found
else the algorithm outputs an integer k, where k is the maximum possible number of different ˙ species the bones are from, according to the input list L.
The worst-case running time of your algorithm must be asymptotically better than O(mn) .
(a) Explain how your algorithm works in clear and concise English. (b) Give the algorithm’s pseudo-code.
(c) Analyse the algorithm’s worst-case time complexity. Hint: Use a data structure that we learned in class.
[The questions below will not be corrected/graded. They are given here as interesting problems that use material that you learned in class.]
Question 4. (0 marks) Assume you have a biased coin, which, when flipped, falls on Heads with probabil- ityp, where 0 < p < 1, and on Tails with probability 1 −p. However, you do not know p. How can you use the coin to simulate an unbiased coin? Formally, you have access to a procedure FlipBiasedCoin(), which returns either 1 or 0 at random. FlipBiasedCoin() returns 1 with probability p, and 0 with probability 1 −p, but you do not know p. Design an algorithm that, without making any other random choices except calling FlipBiasedCoin(), returns 1 with probability 1/2 and 0 with probability 1/2. Your algorithm should not use p. You can assume that each time you call FlipBiasedCoin(), the value it returns is independent of all other calls to it.
a. Describe the algorithm in clear and concise English, and prove that it outputs 0 with probability 1/2 and 1 with probability 1/2.
b. Analyze the expected running time of your algorithm. Note that while the algorithm itself does not use p, the expected running time should be expressed in terms of p.
Question 5. (0 marks) Consider the forest implementation of the disjoint-sets abstract data type, with an initial forest of n distinct elements (each one in a one-node tree). Let σ be any sequence of k UNIONs followed by k′ FINDs; so all UNIONs occur before the FINDs. Prove that the algorithm using Path Compression only (it does not use the Weighted-Union rule) executes σ in O(k + k′ ) time, i.e., in time proportional to the length of σ, in the worst-case.
Do not make assumptions on k or k′ (for example, do not assume that k = n − 1 or that k′ ≤ k). As we did in class, assume that the parameters of each UNION are two set representatives, i.e., two tree roots (so there are no FINDs “inside” each UNION).
Hint: Note that if a vertex becomes a child of a root during the execution of one of the FINDs (because of Path Compression), then it remains a child of this root during all the subsequent FINDs. Use this to compute the “cost” of executing all the k′ FINDs.
Question 6. (0 marks)
We are given as input the set of vertices V = {1,..., n} and a sequence of edges e1 , . . . , em, where for each 1 ≤ i ≤ m, ei = (j, k) for some j, k ∈ V , j k. For each 0 ≤ i ≤ m, define the undirected graph Gi = (V, Ei) where E0 = ∅, and for i ≥ 1, Ei = {e1 ,..., ei}. Design an algorithm that outputs an array C[0..m], with C[i] equal to the number of nodes of the largest connected component of Gi (note that C[0] = 1). Your algorithm should run in time O(n + mlog* n).
Describe the algorithm in clear and concise English. Explain why it is correct and why it runs in the required time complexity.
Hint: See “An application of disjoint-set data structures” in pages 562-564 of CLRS (Chapter 21).