IC312: Data Structures (FA17)


Home Policy Calendar Units Assignments Resources

HW 10: Oracle of Bacon

Overview

This is a programming homework assignment. It is due in class, in hard-copy.

Due Date

This homework is due Mon. 27 Nov. via ic312-submit.

Grading

This is a larger than normal HW assignment, as such it graded out of 100 HW points and 33 Project Points. That is, it counts for a homework plus a bit towards your project score.

Collaboration and Academic Integrity

As the points in this homework also count towards your project grade, you are expected to use the project collaboration policy. Students may informally discuss projects but may not collaborate.

Submission and Starter Code

Submission must occur via ic312-submit. You will be provided the following files:

  • Graph.java : The Graph data structure which you must complete
  • OracleOfBacon.java : The main method for the graph to do Bacon searches
  • movie_cast-20000.list : A plain text, tab separated file of 20,000 movies and their cast members
  • movie_cast-1000000.list : A plain text, tab separated file of 100,000 movies and their cast members

Additionally, YOU SHOULD NOT USE THE ic312 PACKAGE. Instead, you may make FULL use of java.util.

Description

In this homework, you will complete the Graph data structure such that it can be used in a Oracle-of-Bacon program. The Oracle-of-Bacon is described as the path between an actor to Kevin Bacon via the set of movies that each actor has appeared in, eventually reaching Kevin Bacon.

For example, when complete, you should be able to discover the following paths to Kevin Bacon.

aviv@saddleback: sol $ java OracleOfBacon 
Reading in data ...

Search a name (^D to exit): bogart

Choose from the following
(0) BOGACHYOV, VYACHESLAV
(1) BOGAERT, NASHLA
(2) BOGAERT, STEPHEN
(3) BOGARD, DELIA
(4) BOGARDE, DIRK
(5) BOGART, KEITH (I)
(6) BOGART, JESSICA (I)
(7) BOGART, HUMPHREY
(8) BOGART, DOMINIC (I)
(9) BOGART, ANDREA
Choice: 7
Finding Shortest Path

BOGART, HUMPHREY
 | 
 | SIROCCO (1951)
 V 
BROCCO, PETER
 | 
 | HOMEBODIES (1974)
 V 
TOBEY, KENNETH
 | 
 | HERO AT LARGE (1980)
 V 
BACON, KEVIN (I)

Search a name (^D to exit): washington

Choose from the following
(0) WASHBOURNE, MONA
(1) WASHBURN JR., BRYANT
(2) WASHBURN, ALICE
(3) WASHBURN, BRYANT
(4) WASHBURN, RICK (I)
(5) WASHINGTON, HANNAH (I)
(6) WASHINGTON, DENZEL
(7) WASHINGTON, DARREN DUPREE
(8) WASHINGTON, BLUE
(9) WASHINGTON, AARON (I)
Choice: 6
Finding Shortest Path

WASHINGTON, DENZEL
 | 
 | INSIDE MAN (2006)
 V 
PINA, LIONEL
 | 
 | HERO AT LARGE (1980)
 V 
BACON, KEVIN (I)

Search a name (^D to exit):

Implementing your Graph

In the Graph.java, you'll need to complete the Graph ADT methods as described in class. You must complete your graph as an adjacency list. The order of the adjacency list should match the order of the data as read fro the input file.

As we are no longer using the ic312 package, you may notice that the use of some of the data structures has changed slightly. For example, some of the Graph methods return Collections, which is a more generic List like interface. You should spend some time referencing the Java documentation or your instructor if you have questions.

Multiple Edges Between Actors/Actresses

For some actor/actress pairs, there may be multiple edges connecting them, e.g., multiple movies in which the two have appeared. This is fine, and your adjacency list should store all appropriate edges. When exploring the neighbors of a vertex, you should consider the order of the edges with respect to the order they appear in the input file. *It is important to have consistency here, as your shortest path routine's accuracy will depend on having the right order in your adjacency list.*

For example, for movie M N O and actor A B C, if the input file dictates the following addEdge() calls:

addEdge(A,B,M)
addEdge(A,C,N)
addEdge(A,B,N)
addEdge(C,B,N)
addEdge(C,B,O)

Then you should store the following adjacency lists:

A: (B,M) (C,N) (B,N)
B: (A,M) (A,N) (C, N) (C, O)
C: (B,N), (B,O)

So if you were to call neighbors(B), the return list would be in the following order: [A,A,C,O], and yes, it's ok to have repeats in this list. And if you were to call B.getEdge(A), it should return M as it is the first appearance in the adjacency list.

As an implementation suggestion, you may find it easiest to store two lists in your vertex object, one to store a list of vertex neighbors and one to store a list of weights where the order of the lists matches the neighbor to its weight. If you use this implementation, than your addEdge() should be O(1) and your neighbors() method should also be O(1). But, your getEdge() may be O(n). This is a fine tradeoff for this problem as you will call neighbors() and addEdge() much more frequently than you'll call getEdge().

Shortest Path Search

A large part of this assignment will be implementing a Bread First Search of the graph. This will require you to complete the shortestPath method in the Graph.java file. This method returns a LinkedList, where if we were to treat that list as a stack, successive pop()'s from the list will reveal the path from v1 to v2. So, you should push onto this list in reverse order of the path, from v2 to v1.

The Movie Files and Command Line Arguments

The movie_cast-20000.list and movie_cast-100000.list are tab separated text file. Each line of the file represents a movie, the first item on the line, followed by a list of cast members.

You should note that there are a limited number of movies in these files to help facilitate your testing. You may find that some obvious paths don't exist in the shorter 20,000 movie file, for example, there is no path between Kevin Bacon and Brad Pitt. However, in the larger movie file list of 100,000 movies, such a path does exist.

$ java OracleOfBacon movie_cast-100000.list 
Reading in data ...

Search a name (^D to exit): pitt

Choose from the following
(0) PITSILOS, NOTIS
(1) PITSIOS, KOSTAS
(2) PITSIS, TED
(3) PITSKHELAURI, GEORGIY
(4) PITSOULI, PENELOPE
(5) PITT, JULIANNA (II)
(6) PITT, JORDAN
(7) PITT, INGRID
(8) PITT, CHRIS (I)
(9) PITT, BRAD
Choice: 9
Finding Shortest Path

PITT, BRAD
| 
| BEYOND ALL BOUNDARIES (2009)
V 
BACON, KEVIN (I)

Note that to use a different movie list, you specify it as a command line argument. Clearly, it may take longer to load the larger movie file, so only use it once you're confident in the performance of your graph implementation. You should also note that you may have slightly different paths between actors when using the larger data file, but they should get strictly shorter as the 20,000 movies are subset of the 100,000.