IC312: Data Structures (FA16)


Home Policy Calendar Units Assignments Resources

Project 3: Research a Data Structure

Overview

In this project you will research a data structure not covered directly in class. You are requires to provide a report of that data structure, make a short presentation, and potentially implement the data structure. You may work in groups of 2 or 3, but 3 person groups will have a higher expectation of workload, including providing a working implementation.

Due Dates (READ THIS!)

This project is due in parts:

  1. Friday, Nov 17th : Groups and topics must be decided and cover sheet submitted in hard copy to your instructor. The cover sheet consists of all group member names, a group name, and the topic.
  2. Wednesday, Nov 29th: In-class work day and first Drafts of your written report are due via ic312-submit.
  3. Friday, Dec 1st: Meetings to review your first draft and discuss presentation.
  4. Sunday, Dec 3rd: Your written paper, presentation slides, and implementation (if necessary) must be submitted electronically by the usual time on this date.
  5. Mon. Dec 4th and Wed. Dec 6th: In class presentations

Grading

The report will be given a letter grade based on clarity, completeness, and correctness. Aside from "being right" and "sufficiently answering the questions," grammar/spelling/style absolutely counts. Even computer people have to write.

The implementation will earn a letter grade according to whether it works, how well it works, and how interesting/impressive is the demonstration.

In addition, I will set up a way for you to submit comments to the presenters. The usefulness and depth of your comments will be graded on a 1-5 scale.

  • 60%: Written report (if you didn't do an implementation)
  • 40%: Written report (if you did an implementation)
  • 20%: Implementation (only required for 3-person groups)
  • 35%: Presentation
  • 5%: Quality of your comments

All group members will receive the same grade on the written report, presentation, and implementation (if any).

For the written report, you will submit a graded first draft that will count towards the total grade.

  • 80%: The final written report
  • 20%: The first daft of the report

The first draft and the final report will be graded on the following rubric:

  • 20%: Motivation: What motivated the need for this data structure? Who invented it? Why was it developed?
  • 30%: Design: What are the design elements of the data structure? How is it related to other data structures? How does each of the methods work? If I want to use this data structure, is there an implementation or do I need to implement it myself?
  • 30%: Performance: What are the efficiencies of the data structure? What does the data structure do well and poorly? What kind of analysis do we need to understand the performance of the data structure?
  • 20%: Applications: What applications are the data structure used in? When should we consider using it?

The presentation will be graded using this rubric, which both the instructor and your peers will fill out. For the presentation, you will be graded by the instructor and your peers:

  • 60%: instructor
  • 40%: peers

Submission and Starter Code

There is no starter code for this project. If you are providing an implementation, you may use java.util library fully. You are not requires to use the ic312 package, nor should you!.

You should submit the following items via ic312-submit in the project/03 directory:

  • report.pdf : a pdf of your report
  • pres.pdf : a pdf of your slide deck for presentation
  • impl/ : your implementation goes in this directory
  • README : file describing which data structure you worked on and instructions for running your impl if possible.

Description

We've looked at a lot of data structures. Each data structure did some things well, and other things poorly. Of course, we've barely scratched the surface of things we could have covered in this class. There are data structures to optimize for things other than operations, like memory, or cache hits, or network activity. There are data structures specifically for points in large-dimensional spaces, or Strings, or other things. There are researchers who spend their whole careers thinking about how new data structures can be used to solve problems better, faster, simpler, or easier. In this project, let's learn about some of these!

This is a project to be performed in groups of two or three. You will research a new data structure or concept, become an expert on it, write up what you know, maybe write some code, and perform an oral report. There is a good chance you will know more about your data structure than I will!

Suggested Topics

Below are some suggestions for your project. If you know about some other data structure and want to work on that instead, then you should run it by Dr. Roche. (It'll probably be fine.) Each class section may only have one group work on a given data structure or concept.

  • Beap: insert, remove, and find in \(O(\sqrt{n})\)
  • Skip Lists: linked lists with expected \(O(\log(n))\) lookup due to randomization.
  • Coalesced hashing: combination of separate chaining and open addressing to avoid speed problems of open addressing without wasted memory of separate chaining
  • Merkle tree: used to verify that data being received is uncorrupted and unchanged. Applications in security or unreliable data transmission
  • Rope: like a string, but can do modifications much easier, especially if the string is very long.
  • Self-organizing list: a list that reorders itself according to how its elements are used. You should focus on the Move-To-Front (MTF) ordering method.
  • Treap: combination of tree and heap
  • Splay Trees: faster lookup for frequently-accessed elements
  • Red-Black Trees: equivalent to 2-3-4 Trees
  • Cuckoo hashing: collision handling that allows for \(O(1)\) worst-case lookup
  • VLists: linked arrays for faster expected access time than a standard linked list.
  • PATRICIA Tries: space-optimized string storage
  • Suffix tree: a search tree that lets you quickly search for whether a string occurs anywhere in a large text document
  • Multidimensional range trees: a map where the keys are points in multi-dimensional space
  • Binomial heap: insert in constant amortized time, merging of heaps in \(O(\log n)\) time
  • B-Tree: minimizes hard disk accesses when data is too large to be held in RAM
  • K-D Trees: used for storing points in k dimensions
  • R-tree: automatically determines bounding rectangles to store nearby points on a map in a logical way.
  • Bloom filter: quickly returns a probably-correct answer to the question "Does this piece of data appear in this set?"
  • Fibonacci heap: insert, decrease key (like for Dijkstra's algorithm), and merge in constant amortized time
  • ORAMs: A set of data structure that enables oblivious access to data by protecting the write pattern to an observer.
  • Block Chains: A cryptography data structure that enables a verifiable ledger as used in digital currency.

The report

Your report must be IN PDF FORMAT. Word and OpenOffice have lots of crazy versioning and formatting issues that make them awful formats for document dispersal. Computer people have given up on convincing non-computer people of this, but at least you can know better.

Your report should address the questions below. Some questions might not make sense or might not have a good answer for your data structure; then don't answer them. When I say "should address the questions below", I mean that through the course of reading your report I should understand that question. It does not mean that I want a list of questions/answers; you need to write paragraphs with a nice flow and good organization and all those good things.

We're not going to give you a recommended length. If it's clear, answers all the questions, and is short, that's fine. If it's long, that's fine too. Diagrams and pictures are great and likely very helpful; they will be looked upon with favor. Ideas should be clearly communicated, and diagrams should be original in content (ie, don't just reproduce a diagram on wikipedia). The end of the document should include a references section to show me where you learned this material. After reading the report, I need to understand these concepts as well as you do.

  1. Where did this data structure come from? Who thought it up, when, and why?
  2. Is this related to any other data structures we learned about? How does it compare?
  3. How is the data structure constructed, and what methods does it perform?
  4. What applications is this data structure well suited for? Give me a few original examples of real-life problems that can be solved, in part, with your data structure.
  5. How do each of its methods work? What is their runtime? If your data structure isn't focused at optimizing runtime, what is it trying to optimize? If it is space, talk about how it optimizes space. If it's something else, talk about how this data structure optimizes the thing it is trying to optimize.
  6. What does this data structure do well? What does it do poorly? Under what circumstances would you use it, and under what circumstances would you use something else?
  7. If I wanted to use this data structure, is it easy to find an implementation of it? Is it in Java's standard libraries, or those of other programming languages? How hard does it seem to be to implement?

The Presentation

Your time slot will be 5 minutes per person, meaning 2-person groups get 10 minutes and 3-person groups get 15. But you must allow at least 2 minutes for questions from your classmates, and a minute or so to setup the next talk, so really should prepare 7-minute presentations for 2-person groups and 13-minute presentations for 3-person groups.

This is an absurdly short amount of time, which allows for only an introduction. To do this well, you will have to be streamlined and practiced. All group members should speak. You will be cut off when you run out of time, and your presentation will be graded based on what was presented.

You need to be sure to sign up for a date when all group members will be present in class. No matter what date you have signed up for, your visual aids are due to me by the same deadline as a report, to ensure that you've got everything prepared before Thanksgiving.

Upon being asked how long it took to prepare a speech: "It depends. If I am to speak ten minutes, I need a week for preparation; if fifteen minutes, three days; if an hour, I am ready now." – Woodrow Wilson (likely apocryphal, but the sentiment is true)

The Implementation

3-person groups are required to submit a Java implementation using your data structure. 2-person groups may choose to do this as well, which will mean the written report counts for a smaller portion of your grade (although the expectations will be the same).

There is a great deal of flexibility on how your implementation can work. What I really need is some kind of demonstration that I can run, that compiles in Java with no special requirements. You need to include a README file that explains which file contains your main method that I'm going to run, as well as any other info I need to know about what your demonstration is doing and any resources you used to work on it.

Some data structures are harder to implement than others, so there is flexibility in how good your implementation needs to be. In some extreme cases, it might be okay to find an open-source implementation online and just make a demonstration program that uses that. It would be a good idea to email your instructor and clear the idea of what you're going to do to make sure it's sufficient.

If your demonstration program does not work, you should not expect to earn a good grade for this part.