Assignments

[Note: this webpage last modified Sunday, 09-Dec-2012 14:29:26 EST]

Homework assignments will be posted to this website. Each homework assignment will list the date the assignment is due. Late assignments will not receive any credit; I will grade them just so you know how you did.

Here is a rough idea of my plans for some of the assignments.

Some potential data we might use in our examples/assignments/projects.

My best guess at the "projects" we'll be doing.

Homework Assignments

The homework assignments are posted in the in class code. hw1.txt contains the first homework assignment, hw2.php is the second homework assignments, etc. I will announce in class and/or by email when a new assignment has been posted. Note that if you are logged into the CS or one of the x** machines, you can copy the in class code from the directory ~jkinne/public_html/cs457-f2012/code/.

Software Setup

All assignments must compile and run on the CS server and x** machines in room A-015. Each assignment template file has instructions for how the program will be run/evaluated.

Logging into CS: If you want to keep your files on the CS server and login remotely to do your programming, you need an SSH client. A free client for Windows is Putty. A remote sftp client is WinSCP. If you have Linux, Unix, or Mac OS X then ssh and sftp access are already included - just open a terminal and type "ssh username@cs.indstate.edu" or "sftp username@cs.indstate.edu".

Emacs If you are programming by logging into the CS server, you have some choice about what text editor to use. Those installed on the CS server include: vi, pico, nano, emacs. I use emacs because it does auto-indenting and other things. If you decide to use emacs, you'll need to learn the shortcuts for things. The ones I use most are the following (search online for others).

Windows: You can also run programs on Windows if you will work mostly from your personal computer. But when you turn the program in, it must and run on the cs and x** systems.

Project

Final Exam

Exam format: ditto other exams.

Exam topics:

Types of questions.

Second Exam

Exam format: same as first.

Exam topics: Most of what we've done since the first exame. In particular...

Types of questions.

Sample exam.

  1. T/F. In the worst-case, a lookup into an extendable hash table takes O(1) time.

  2. T/F. Assume there is a B+ index for an attribute in a table that is stored in a file that uses slotted pages. The pointers from the B+ tree leaves into the data file point directly to the part of the page where the record is stored.

  3. T/F. Because having indices can make queries faster, it is best for performance to make sure the database keeps indices on all attributes in all tables.

  4. Calculate. Suppose your hard drive has the following properties: 4KB blocks, 4ms/seek, .001 ms transfer/block. What is the time to do the following:

    • Lookup in B+ tree with depth 5 (depth includes root but does not include leaves).
    • Insert in B+ tree with depth 5.
    • Block nested-loop join with two tables that each have 1 million rows, stored 512 records/block.
  5. Run the algorithm. BST. Draw what a BST (with items >= an item's key going to the right) would look like after:
    (i) insert 5, insert 10, insert 7, insert 0, insert 4, insert -1, insert 10, insert 15, insert 9, insert 6,
    (ii) delete 10,
    (iii) delete 5

  6. Run the algorithm. B+. Consider the B+ tree The internal nodes must have at least 2 pointers. The leaves must have at least 3. Draw what the B+ tree looks like after:
    (i) insert 20,
    (ii) insert 26,
    (iii) delete 35.

  7. Run the algorithm - Indexed Nested-Loop Join for the query "SELECT * from student NATURAL JOIN takes". The student table is

    ID	name	dept_name	tot_cred
    00128	Zhang	Comp. Sci.	102
    12345	Shankar	Comp. Sci.	32
    19991	Brandt	History	80
    23121	Chavez	Finance	110
    44553	Peltier	Physics	56
    45678	Levy	Physics	46
    
    The takes table is
    ID	course_id	sec_id	semester	year	grade
    00128	CS-101	1	Fall	2009	A
    00128	CS-347	1	Fall	2009	A-
    12345	CS-101	1	Fall	2009	C
    12345	CS-190	2	Spring	2009	A
    98765	CS-315	1	Spring	2010	B
    98988	BIO-101	1	Summer	2009	A
    
    Assume the data for each table is stored in a file in the order shown, with 3 records per block.

    To answer the question, first give what the result of the query will be.

    Next, how many disk reads are executed in evaluating the join, and what are they?

  8. Query processing plan. Suppose the following are queries that will regularly be run on the university database:
    SELECT * FROM student WHERE name = "__something__"
    SELECT * FROM instructor NATURAL JOIN teaches WHERE dept_name = "__something__"

    Decide on a storage format for the tables involved and any indices you will keep on any of the attributes. Describe what you have decided.

    Decide on how the queries would be evaluated, and describe the basic idea.

    Give a formula for how long each query would take in terms of the following variables: ts = time for seek, tb = time to read one block, assume 512 records/block for any files used, S = # records in student table, I = # records in instructor table, T = # records in teaches table, assume 20 instructors in each department, assume student's names are unique.

First Exam

Exam format: on paper, in class, no computer/calculators. You can have one sheet of paper with anything you want on it (front and back, hand written or printed off the computer).

Exam topics: Most of what we've done so far. In particular...

Exam, types of questions: For T/F, you need to give a sentence or two explaining why your answer is correct. For "Give SQL query", you are asked to give an SQL query (or a few if more than one is needed) to accomplish a task. For "What does SQL query do", you are given an SQL query and need to describe the result set. For analyze, you are asked a question requiring a bit more thought.

Sample Exam.

  1. T/F. There can be only one KEY in a database table/relation.

  2. T/F. It is not a good idea to create too many database views because storing the extra data would take up valuable disk space.

  3. T/F. It is possible that an inner join and an outer join could return the same result set for a given table.

  4. Give SQL query. Consider the following insurance database.

     person(driver_id PK, name, address)
     car(license PK, model, year)
     accident(report_number PK, date, location)
     owns(driver_id PK, license FK)
     participated(report_number FK, license FK, driver_id PK, damage_amount)
    

    Give a SQL queries to find (1) the total number of people who owned cars that were involved in accidents in each year from 1980 to 2012, (2) the average number of such people per year in the years in that range.

  5. Give SQL query. Given an SQL query to remove all accidents with damage_amounts less than $50. Give an SQL query to find the names of all people involved in accidents with a damage_amount about $10,000.

  6. What does SQL query do. Sample data would be given, and you'd need to explain what it does and give the resulting table.

    SELECT model, year, SUM(damage_amount) FROM car, participated
    WHERE year > 2000
    GROUP BY model, year
    HAVING SUM(damage_amount) > 1000
    ORDER BY damage_amount DESC
    
  7. What does SQL query do.

    SET @d = '1950-01-01';
    DELETE FROM participated WHERE 
    report_number IN 
    (SELECT report_number FROM accident WHERE date < @d);
    DELETE FROM accident WHERE date < @d;
    
  8. Analyze. For the query in problem 6, give pseudocode for how you would implement this query (e.g., with for loops). Give a big-O estimate of the running time of your pseudocode, letting n = total # rows in all tables. Assume each table is stored in a binary search tree with the key value being the primary key (or first listed foreign key, if there is no primary key for that table).

  9. Analayze. Give an E-R diagram for the following banking database.

    branch(branch_name PK, branch_city, assets)
    customer(customer_name PK, customer_street, customer_city)
    loan(loan_number PK, branch_name, amount)
    borrower(customer_name FK, loan_number FK)
    account(account_number PK, branch_name, balance)
    depositor(customer_name FK, account_number FK)
    
    Note: you should also be able to take an E-R diagram and produce SQL statements to create a schema based off the diagram.