CS 620 Advanced Theory of Comp - Fall 2010, Basic Tips on Proofs

Basic Tips on Proofs

[Note: this webpage last modified Friday, 04-Feb-2011 19:44:51 EST]

Here I am collecting pointers/advice on mistakes that you have made that you should try to avoid in the future. I will put new comments here after each homework and exam.

Things you should already know...
Initial pointers at beginning of semester
Comments on Homework 1
Comments on Homework 2
Comments on Homework 3
Comments on Homework 4
Comments on Homework 5
Comments on Homework 6
Comments on Homework 7
Comments on first exam

Things you should already know...

Here I will keep a list of things I think you should already know before starting this course. If you do not know remember these well, then you should review them. This list essentially corresponds to the most important topics from data structures (CS 202/258), discrete math (CS 303 / Math 320), algorithms (CS 458/558), and the first course in theory of computation (CS 420/520).

Algorithms. You should know the basic idea of how to solve the following problems (including being able to write correct pseudocode and analyze the running time and correctness of the algorithm):
sorting (e.g., insertion sort, merge-sort, heap-sort), shortest path (e.g., DFS, BFS), spanning tree (e.g., Prim's or Kruskal's algorithm), binary search,
Data structures. You should be familiar with the properties of the following data structures:
heaps, queues, stacks,
You should be familiar with and be able to give DFA's, NFA's, regular expressions, and context free grammars for problems.
You should be able to prove a problem is undecidable by reducing it to the halting problem.
You should be able to prove the n*log(n) lower bound for sorting using comparisons.
Proofs. You should be familiar with proofs by induction. Note that giving examples is NOT a proof. You should also be able to argue correctness of some basic algorithms (like those mentioned in the previous bullet).
Big-O and running time. You should be able to give big-O running times of algorithms, whether they be poly-time algorithms, exponential time algorithms, etc.
Probability theory. We will review the basic rules/theorems of probability, but we will do this quickly, and it will be up to you to refresh/learn some of this material.
Miscellaneous other math stuff. If we need some linear algebra, basic number theory, etc. then we will quickly review what we need. But we will not spend a lot of time on the "math-ness" of the math we need to use. We will just see how we can make use of it to say something about algorithms and computation.

Initial pointers at beginning of semester

Many students do not have a lot of background giving proofs (especially in the context of computer science) when they begin this course. But, you will have plenty of experience by the time you are done with this course. In the meantime, here are some pointers on problems to avoid that I have seen so far.

Proof by example. Giving small examples and showing that they work is nice for getting some intuition and understanding what is going on. But this is not a proof. If I ask you to prove an algorithm is correct, for example Dijkstra's shortest path algorithm, you need to prove it is correct for all graphs - not just for a few small graphs. If I ask you to prove that the number of binary strings with n bits is equal to 2ⁿ, you need to prove this for all n not just for a few small values of n.
Proof by induction. Also, for a proof by induction, it is not good enough to show that a few small cases work. It is probably best if you review proof by induction, because there has been some confusion about this.
Asymptotic big-O notation. If I ask you to prove some big-O estimate using the definition of big-O, then you need to carefully apply the definition of big-O. Remember that f(n) = O(g(n)) if there is a constant c such that f(n) ≤ c*g(n) for sufficiently large n, so for all n ≥ n₀ for some constant n₀. It can be that f(n) ≥ c*g(n) for small values of n, but eventually f(n) is less than c*g(n).
In the calculations, some people assumed that they could "factor" like this: f(c*n) = c*f(n). But this is not true, for example if f(n) = n².
Running time of algorithms. We always measure the running time of algorithms as a function of the input length, the number of bits in the input, which we normally represent as n. If the input to a problem is a number N, then the length of the input n is the number of bits to represent N, so log(N). So if you calculate that the running time of the algorithm is N², then as a function of the input length, this is 2²ⁿ.
Do not give extraneous information in your solutions. If I ask for the running time of some particular algorithm A, do not tell me what the running time of some other algorithm is that is not pertinent to the problem. For example, do not just copy things out of books or notes from your previous courses just to show me that you know something related to the problem. Just answer the question being asked as best you can.

Comments on Homework 1

After having graded the homework problems, here are some pointers and notes to keep in mind in the future. I am also leaving here clarifications I made before the assignment was due.

Showing that something works for a few small examples is not a proof. You must prove the thing is true ALWAYS. For example, problems 3a and 3b. Looking at examples is fine and good for getting intuition, but then you need to prove the claim holds in general.
Do not just copy information out of a book or from notes that is not relevant to the problem. For example, for problem 1, many people wrote down what they know about big-O - the running time of insertion sort, for example. That is not relevant to the problem. Just answer the question, and if you can't figure it out say whatever you can ABOUT THE PROBLEM.
You can not "factor out of a function". Some people assumed that if f is a function and c a constant, then f(c*n) = c*f(n). This is only true for the function f(n)=n and does not hold in general (for example it does not hold for f(n)=n².
Many of you did not know what a proof by induction is. Proof by induction is not just looking at small examples. You should review proof by induction.
When asked to give an estimate of the running time of the algorithm, look at the algorithm we are talking about and not some other algorithm. Just trace through pseudocode or the description of the algorithm and explain how much time each step takes and therefore how much time it takes overall.
When looking at the running time of algorithms, we always measure the running time in terms of the bit-length of the input, which we denote as n. So if you are dealing with an input that is a number x, the bit length n is equal to about log(x). So an algorithm that runs in poly(x) time runs in exponential time in the input length n=log(x).
The definition of big-O is that f(n) = O(g(n)) if "f is less than a constant factor times g for sufficiently large n". f may be larger than g for small values of n, but eventually it is smaller than g times a constant.
Some students seem to be unfamiliar with big-O algorithm runtime analysis, proof by induction, and other material I thought you might already know. Here is a link to some notes on a course taught by someone that review some of this information: http://www.eng.unt.edu/ian/books/free/lnoa.pdf. You can also look at the wikipedia pages for analysis of algorithms, big-O notation, and time complexity. Here are some more notes on big-O including a link to a PDF at the end that has a big-O proof that uses induction: http://www.cs.utk.edu/~plank/plank/classes/cs140/Notes/BigO/. And here is another lecture notes that discusses big-O and has a proof by induction: http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic3/. These were all from a quick search on google. I suggest searching for things like "big O algorithm analysis", "big O problems", "proof by induction examples", etc.
Some students asked about problem 1b of homework 1. Problem 1b speaks of "exponentially bounded" functions. In this course, "exponentially bounded" means less than 2^{n^c} for some constant c.
Note that there are functions that are /not/ exponentially bounded - they are "bigger than just exponential". In particular, something of the form 2^2ⁿ is /not/ less than 2^{n^c} for any constant c. Something of the form 2^2ⁿ we will call "doubly exponential".
This use of the term "exponentially bounded" may be different than what you have used before, but this is what it means in the field of theory of computing, and this is the meaning we use in this course.

Comments on Homework 2

After having graded the homework problems, here are some pointers and notes to keep in mind in the future.

NP verifier. An NP verifier is a deterministic TM (a standard algorithm) that runs in polynomial time. It does not loop over all possible proofs. It just takes a potential proof/certificate/witness as an additional input and checks if that one "verifies" that the input is a "yes". NP is about efficient verification, so an NP verifier better be efficient (polynomial time)!
Giving a verifier for an NP problem is just giving an algorithm. And you should be used to giving algorithms for things. Now you are just giving an algorithm that takes some extra "help" as the second input - the proof/certificate/witness.
Writing proofs and homework solutions. You should write proofs "at the level of your classmates". That means one of your classmates who has not solved the problem should be able to understand your explanation. You should write in complete sentences with everything being very clearly explained. A well-written proof will give a brief overview of the "main idea" of how the algorithm or proof works and then "fills in the details" to more formally prove the claim.
Problem is in NP. A computational problem is in NP if there is an efficient verifier that "solves" the problem FOR ALL INFINITELY MANY inputs. Similarly, a problem is in P if an efficient algorithm solves the problem for all infinitely many inputs. If you give some small examples and describe why they are either "yes" or "no" solutions, you have not solved the problem; you have only looked at a few examples. An NP verifier or P algorithm must solve the problem for all possible inputs.
People still had problems applying the definition of big-O correctly. Look at the model solutions.
In problem 2a, I stated that "smaller than any polynomial" means smaller than n^d for any constant d>0. So the answer needs to be smaller than, for example, n^.0001. For that problem, I stated that "polynomial" even includes exponents that are less than one. Pay careful attention to the precise statements of the homework questions.
Checking that a number is composite/prime. If you have a loop to check all possible factors, if you find a factor inside the loop then you know the number is composite. But if in a single iteration of the loop you have not found a factor, you don't know yet. You cannot conclude the number is prime until you have verified that all possible factors are not factors, so you cannot return "prime" until AFTER the loop is done.
The definition of two graphs G1 and G2 being isomorphic was given on the homework assignment. It is NOT true that G1 and G2 have to be isomorphic just because each vertex in G1 has the same degree as a corresponding vertex in G2. You can come up with a counterexample with 4 vertices and 4 edges.

Comments on Homework 3

After having graded the homework problems, here are some pointers and notes to keep in mind in the future.

To prove a problem is solvable in P, you need to prove/explain why algorithm is correct - that when it answers no this is the correct answer, and that when it answers yes it is the correct answer. You cannot just assume that the reader reads your pseudocode or algorithm description and understands why it gives the correct answer.
When giving a reduction from one NP-complete problem to another problem to show that the second problem is NP-hard, you need to give a poly-time reduction. So you need to explain why the reduction is computable in polynomial time.
To show a problem is NP-complete, you need to show it is NP-hard and also explain why it is in NP - give a poly-time verifier and what the witness/proof/certificate would be.
For problem 2a, most of you did not use the bounded halting problem, but rather used the "machine/encoding/universal machine" problem, essentially copying the proof of that which we did in class. Those who did this mostly did write it up nicely, so that's good at least.
For problem 2b and 2c, most of you used the same problem from 2a rather than talking about the unbounded halting problem. The unbounded halting problem is different. For 2c many people said that the halting problem cannot be either solved or verified in poly-time. Saying it cannot be verified in poly-time means it is not an NP problem, but saying it cannot be solved in poly-time just means it is not in P.
For the third problem, some people were confused by this, but there was a typo in the original assignment passed out. So for those people, I did not count the problem as part of the score - I gave a "0/0".

Comments on Homework 4

After having graded the homework problems, here are some pointers and notes to keep in mind in the future.

Typing the answers made a difference. Overall, your explanations were clearer than when you handwritten things. The writing style and explaining things clearly still needs a lot of work for most of you, but you are improving.
In terms of typing things in Word, if you don't want to go through the trouble of inserting symbols for union, intersection, etc. you can just type out the words union, intersection, etc. Either way is fine with me.
Problem 1 issues...
Showing that the union is in NP does not imply the intersection is in NP, and neither does showing the intersection in NP imply the union is.
You cannot just assume the languages L1 and L2 are NP-complete. They are just some NP languages, they may or may not be NP-complete.
To give a verifier for the union or intersection, you run verifiers for the two languages, but you have to use separate certificates/witnesses to those verifiers. It is not correct to say that the same certificate/witness should work for both verifiers. Think about why.
For the complement, most of you concluded that the complement is not in NP. This is wrong. As far as we know, the complement is not in NP. But all you have shown is that one particular verifier does not work for the complement, you have not shown that no verifier works. If you did show this, you would have solved a major open question in theory of computing.
For the union, the verifier you construct will say "yes" if either verifier accepts. It says "no" if both verifiers reject. For the intersection, the verifier says yes if both verifiers accept, and says no if either rejects.
Many of you are being very imprecise in the way you use terms. This is not good. Examples...
Some of you said things like P= n^O(1). No, P is a class of problems, not a function, not a polynomial.
For problem 2a, many of you gave pseudocode for a verifier and had something like "if G1 is not isomorphic to any subgraph of G' then Graph Reconstruction is not in NP^NP". No, in this case the verifier should reject. If everything checks out the verifier should accept. Saying the Graph Reconstruction problem is in NP^NP is simply something that is true or false (it is true). This is not what your verifier should be outputting - it should be outputting yes or no to the particular input it was given.
P is not a polynomial, NP is not a polynomial. They are classes of problems. So you say things like "sorting is in P", "Clique is in NP". You do not say things like NP=n^{n^O(1)}.
For the NP-completeness proofs, I was lenient on the grading given the short notice and lack of direction given by me. I have everyone 20/20 this time. Next time, I'll be more clear about what expectations are. That said, the groups that did Independent Set and Hamiltonian Cycle both did a nice job of explaining their problems clearly, showing they are NP-complete. The other two groups had slight some issues in conveying why the reduction worked.
All of the groups had one problem in common. None of you explained that the reduction is computable in polynomial time. Typically, this is very easy to see, but still worth mentioning.

Comments on Homework 5

Global comments. I graded fairly easily again, so just because you got good score on a question (even a perfect score) does not mean you had no mistakes. Always look at the model solutions to see if there is anything you missed.
Collaboration and copying off of websites/books. YOU ARE REQUIRED to list your collaborators. Many of you did not do so. From now on, I will either take points off for this, or give you a 0. YOU ALSO need to cite any sources you have used. If you are trying to copy off of a website or book, you better cite the source. If you do not understand what you are copying, then don't turn that in: you will get a 0 for the problem at the very least. I MUCH PREFER you to try and figure the problems out on your own rather than trying to look them up.
Problem 1. Some people did not understand the question. It was not asking about the halting problem, and it was not asking to determine if a given machine runs in exponential time. The problem was just saying, if we just look at the execution of an algorithm while it uses time t, does it accept or reject.
There were a number of issues on this and other problems with "using terms appropriately". Here are a few.
You do not say "a problem is O(n²)" (or any other time bound). You say it can be solved in time O(n²).
You do not say M is in L (machine M is in the language L). You can say the machine M solves or computes the language L. You can say a particular string x is in L (meaning it should be a "yes" answer).
EXP is not a machine or a language. It is a class/set of problems that each can be solved in at most exponential time.
"In exponential time" does not mean exactly or at least exponential time. It means at most. So solving a problem in polynomial time is also solving it in exponential time, by not the other way.
Simulate, not stimulate.
Problem 2. Most people had the basic idea for this, but many of you did not carry it out correctly. Many of the problems were using terms wrong, see the last bullet point.
A few people cited a generalization of the statement to other time bounds. This clearly indicates to me that you copied the answer from somewhere. I much prefer you figure things out on your own. If you do use an outside source, YOU HAD BETTER CITE IT!!! Not doing so is considered a violation of the academic integrity policy.
Problem 3. Same issue on this problem about copying from online lecture notes. Most of you who did so did not seem to really understand what you were writing. Then you should not be turning it in as your work if you do not understand it. Do not do this!
Problem 4. A few of you did not get the point that the proof was wrong. Some people thought that they were supposed to point out a flaw in the assumption P=NP. If you could do that, then you would have proved P!=NP, and you would be famous. That is not what I was asking. See the model solutions.

Comments on Homework 6

Global comments. Terminology: Turing machine, not Turning machine.
Listing collaborators - good job on doing that, keep it up! Though of course, there was the issue of not citing online sources...
Problem 1. Many of you just copied the answer for the standard birthday problem where we were looking at the probability that at least 2 people share a common birthday. Others saw there was a difference but did not see how complicated calculating the probability correctly really is: you just took into account the probability that exactly two people have the same birthday and all the rest different, and you did not take into account the possibilities for multiple pairs having the same birthdays.
Problem 2. Many people did not have reasonable values for Pr(I) and Pr(D), or for Pr(I|D). You should use some common sense and make sure your probabilities seem reasonable. All probabilities are values between 0 and 1. If you have a probability greater than 1, you probably hava a problem somewhere. Many people had values for Pr(I|D) that were lower than Pr(I), which would mean you are more likely to be innocent if you have a DNA match from the crime scene than if we had never taken the test. That does not make sense.
Pr(A|B) does not mean Pr(A)/Pr(B). It means the probability of A if you already know that B is true, it is read "probability of A given B". It is defined as Pr(A and B)/Pr(B).
To use the law of total probability, it is a weighted sum, not just a sum. So it is not the case that Pr(D) = Pr(guilty) + Pr(false positive). Instead, Pr(D) = Pr(guilty)*Pr(D | guilty) + Pr(innocent)*Pr(D | innocent).
For problem 2, there were a number of solutions that were reasonable but not exactly correct. If I said your solution was not correct, you should convince yourself why it is incorrect so you don't make the same mistakes again. Reasoning with probabilities can be subtle...
Problem 3. We have discussed the issue of plagiarism already. Some of you claimed you thought it was okay to copy since you understood and tried to put the answers in your own words. That is not correct. If you are using a source you need to cite it. If you do not, you are claiming that you came up with the answer on your own, and that would not be correct.

Comments on Homework 7

Global comments. Remember to list who you collaborated with (talked about the problems with). This is helpful for me in grading. If you did not talk to anyone about the problems, then say "no collaborators".
Problem 2. Almost no-one got this problem, and most people did not really seem to know what was going on. You can think of a particular example. The primality test given in class and the textbook used as a blackbox/oracle a randomized algorithm for computing square roots mod a prime. The proof showed that if we had correct answers always to the square roots problem, we would correctly determine primality with high probability. But what happens if we plug in a randomized algorithm for the square roots problem that makes mistakes? This problem shows that we can still compute primality correctly with high probability.
People were also confused by the difference between BPP^BPP and randomized reductions. We did not talk about randomized reductions in class. There are various ways you could define them, many which do NOT correspond to BPP^BPP. BPP^BPP means the base machine gets to ask many questions about the BPP oracle, not just one.
Problem 3. Most people got this one right, but a few people solved the problem of "what is the probability of exactly or at most t heads anywhere in the sequence" rather than "t heads in a row". If you want to know, the probability that there are exactly t heads at any point in the sequence (not necessarily next to each other) is (2t choose t)/2^{2t}, and if you apply Sterling's formula to the factorials, this is Theta(1/sqrt(2t)). The probability that there are at least t heads anywhere in the sequence is exactly 1/2.
Problem 4. The answers for this problem were mostly good. But you need to provide the reference (book title and chapter, or webpage link if using a webpage) so I can look at your source if I want to. If you are going to present to the class, I want you to focus on the probability analysis.

Comments on first exam

Global comments. It seems some of you spent a good deal of time writing down what you knew, but did not spend enough time trying to figure out the problems. I wanted you to have to think a little bit rather than just saying what you already knew from memory. This has been a problem on the homeworks as well. Spend more time trying to figure out the problems, and less time simply writing down whatever you know even though it does not answer the question. I just want you to answer the question as best you can. I don't want you writing down a solution to another problem - that is not what I am asking.
Problem 1. Many of you tried to do a reduction from the halting problem by saying something like "if M(x) halts then go into an infinite loop, if it does not halt then halt and accept". The reduction is supposed to be computable, so we cannot just check if M(x) halts - that is undecidable!
Many of you were confused about the distinction between halting and accepting. When a machine halts, it either accepts or rejects. So halting is not the same as accepting.
There was also confusion about "accepting all inputs of length n". This would mean that for all x such that |x|=n, the machine accepts. Some of you said things like "accept the length n input" or "accept the all strings length n input". These are not correct - there is not just one string that has length n (there are 2ⁿ of these), and there is no "all strings" input.
One person simply repeated verbatim the proof from the sample exam. But this was a different question, so you did not answer the question on the exam. Do not just repeat a related proof, answer the question being asked.
Problem 2. A few people had everything mostly right but neglected to mention that if z=f^-1(y) is given as the witness we also need to verify that the correct inverse was given, namely that f(x)=y - which we can do because f is poly-time computable.
Many of you said that we should simply compute f^-1(y) and check that the i-th bit is equal to b. But we do not know if f^-1(y) is computable in polynomial time! For example, if f is multiplication and f^-1 is factoring - then we can compute f fast, but we do not know a way to compute the inverse fast.
Problem 3. Most of you had a correct reduction but did not say why it worked - why "yes" instances are mapped to "yes" instances, and "no" instances are mapped to "no" instances. Of course, doing this is very important!
A few people said something about reductions in general but not about the problem actually being asked. I was generous and gave you a few points for this, but you did not answer the question.
Problem 4. Many of you simply said to check in the graph if there is an edge between s and t. This is not what the question was asking. It was asking if there is a path from s to t - a sequence of edges that leads from s to t. You could have written up pseudocode for BFS/DFS.
A few people had what appeared to be a correct algorithm, but I was not certain from what you wrote down.
Also a few people missed the point that you need to repeatedly check neighbors of neighbors of neighbors, etc. Not just checking the neighbors of s and the neighbors of t.
A few people wrote up an algorithm that is correct but takes exponential time. Something like - look at all neighbors of s, all neighbors of those, etc. Just written up that way, it is checking all possible paths from s - and there are in general an exponential number of those. BFS/DFS are much more efficient by keeping track of the vertices we have already seen, so we do not repeat work we have already done.