Project Ideas for CS 670. You can work in pairs or alone, your choice. You should select from the following, or if you have your own idea to propose then you need to talk to me about it. * Advice on finding more information. + Search scholar.google.com + Search for your search terms and lecture notes + Find the conferences related to your topic on http://en.wikipedia.org/wiki/List_of_computer_science_conferences and look through recent conferences for related articles (1) Factoring challenge. Work on factoring algorithms to see what the largest number is that can be factored using our computers. Most likely this means an MPI implementation. A highly optimized OMP implementation would also be interesting, as would a highly optimized GPU/OpenCL implementation. * Resources: + Integer Factoring algorithms - wikipedia http://en.wikipedia.org/wiki/Integer_factorization + Factoring Large Integers using Parallel Quadratic Sieve ftp://ftp.nada.kth.se/Theory/Joel-Brynielsson/qs.pdf (2) Distributed graphics. Use the department's computer systems to render a short CGI video, with as much detail as possible. Those who took graphics in the fall have an idea of what this would entail - probably MPI with each process using OpenGL and/or OpenCL. The basic algorithm is ray tracing, with plenty of potential optimizations. * Resources: + Ray Tracing on wikipedia http://en.wikipedia.org/wiki/Ray_tracing_(graphics) + Graphics course from the fall http://cs.indstate.edu/~jkinne/cs440-f2012/ textbook: http://www.cs.cornell.edu/~srm/fcg3/ (3) Latest research. Look through some papers from most recent parallel processing conferences, choose one that looks interesting/doable, and see about confirming the results of the paper and/or trying some variations. * First step - look through paper titles. Choose between the following conferences: PODC, ICDCS, SPAA, PPoPP. Search for DBLP and one of those. So for example, search for DBLP PODC. The first result or so will be http://www.informatik.uni-trier.de/~ley/db/conf/podc/ + On that page, click Contents for a given year to see the list of papers. For a paper that is interesting, search on google for the title to see if it is available online (e.g., from the author's website). + Look through a few papers that way - read the introduction to get an idea + ADVICE: figure out which conference you think you want to look through, and check with me. I'll look through also and let you know if that is the right conference to look at. * Second step - once you pick a paper that looks interesting, you need to understand what they are doing, and you want to duplicate their results. (4) Parallel programming in Python. You become an expert in parallel programming in python. Start by mastering the content in the book we're using. From there, experiment by implementing some of the algorithms we have done in class in python - and seeing what the difference is in performance. (5) Parallel programming in R. Similar to the last, but in R instead of Python. (5b) Or do the same but in Java. (6) Other algorithm challenges. Similar to the factoring challenge, but for some other problem. Some other problems to consider: + TSP + Linear programming + Integer programming + Scheduling/packing + Graph isomorphism + Sequence alignment (7) Data analysis/tracking. Develop some little software to run on all of the computer systems that tracks CPU and disk usage, and a front-end to view and analyze the data. For example, how much are the systems utilized? How often are users logged in? Which systems get used the most and the least? How many users are using the most disk space? ... * You could get started on this by just developing a script that runs on one system. * You would also set an appointment to talk to Steve Baker about his suggestions about what tools to use. Some data will already be available on the systems in log files. * You would also search online to see what other people are using. * Once you get something running, we could put it on all the systems, let it run for a week or so and then you would need to analyze the data.