HW 10: Oracle of Bacon
Overview
This is a programming homework assignment. It is due in class, in hard-copy.
Due Date
This homework is due Mon. 27 Nov. via ic312-submit
.
Grading
This is a larger than normal HW assignment, as such it graded out of 100 HW points and 33 Project Points. That is, it counts for a homework plus a bit towards your project score.
Collaboration and Academic Integrity
As the points in this homework also count towards your project grade, you are expected to use the project collaboration policy. Students may informally discuss projects but may not collaborate.
Submission and Starter Code
Submission must occur via ic312-submit. You will be provided the following files:
Graph.java
: The Graph data structure which you must completeOracleOfBacon.java
: The main method for the graph to do Bacon searchesmovie_cast-20000.list
: A plain text, tab separated file of 20,000 movies and their cast membersmovie_cast-1000000.list
: A plain text, tab separated file of 100,000 movies and their cast members
Additionally, YOU SHOULD NOT USE THE ic312
PACKAGE. Instead,
you may make FULL use of java.util
.
Description
In this homework, you will complete the Graph data structure such that it can be used in a Oracle-of-Bacon program. The Oracle-of-Bacon is described as the path between an actor to Kevin Bacon via the set of movies that each actor has appeared in, eventually reaching Kevin Bacon.
For example, when complete, you should be able to discover the following paths to Kevin Bacon.
aviv@saddleback: sol $ java OracleOfBacon Reading in data ... Search a name (^D to exit): bogart Choose from the following (0) BOGACHYOV, VYACHESLAV (1) BOGAERT, NASHLA (2) BOGAERT, STEPHEN (3) BOGARD, DELIA (4) BOGARDE, DIRK (5) BOGART, KEITH (I) (6) BOGART, JESSICA (I) (7) BOGART, HUMPHREY (8) BOGART, DOMINIC (I) (9) BOGART, ANDREA Choice: 7 Finding Shortest Path BOGART, HUMPHREY | | SIROCCO (1951) V BROCCO, PETER | | HOMEBODIES (1974) V TOBEY, KENNETH | | HERO AT LARGE (1980) V BACON, KEVIN (I) Search a name (^D to exit): washington Choose from the following (0) WASHBOURNE, MONA (1) WASHBURN JR., BRYANT (2) WASHBURN, ALICE (3) WASHBURN, BRYANT (4) WASHBURN, RICK (I) (5) WASHINGTON, HANNAH (I) (6) WASHINGTON, DENZEL (7) WASHINGTON, DARREN DUPREE (8) WASHINGTON, BLUE (9) WASHINGTON, AARON (I) Choice: 6 Finding Shortest Path WASHINGTON, DENZEL | | INSIDE MAN (2006) V PINA, LIONEL | | HERO AT LARGE (1980) V BACON, KEVIN (I) Search a name (^D to exit):
Implementing your Graph
In the Graph.java
, you'll need to complete the Graph ADT methods
as described in class. You must complete your graph as an
adjacency list. The order of the adjacency list should match the
order of the data as read fro the input file.
As we are no longer using the ic312
package, you may notice that
the use of some of the data structures has changed slightly. For
example, some of the Graph methods return Collections
, which is a
more generic List like interface. You should spend some time
referencing the Java documentation or your instructor if you have
questions.
Multiple Edges Between Actors/Actresses
For some actor/actress pairs, there may be multiple edges connecting them, e.g., multiple movies in which the two have appeared. This is fine, and your adjacency list should store all appropriate edges. When exploring the neighbors of a vertex, you should consider the order of the edges with respect to the order they appear in the input file. *It is important to have consistency here, as your shortest path routine's accuracy will depend on having the right order in your adjacency list.*
For example, for movie M N O and actor A B C, if the input file dictates the following addEdge() calls:
addEdge(A,B,M) addEdge(A,C,N) addEdge(A,B,N) addEdge(C,B,N) addEdge(C,B,O)
Then you should store the following adjacency lists:
A: (B,M) (C,N) (B,N) B: (A,M) (A,N) (C, N) (C, O) C: (B,N), (B,O)
So if you were to call neighbors(B), the return list would be in the following order: [A,A,C,O], and yes, it's ok to have repeats in this list. And if you were to call B.getEdge(A), it should return M as it is the first appearance in the adjacency list.
As an implementation suggestion, you may find it easiest to store two lists in your vertex object, one to store a list of vertex neighbors and one to store a list of weights where the order of the lists matches the neighbor to its weight. If you use this implementation, than your addEdge() should be O(1) and your neighbors() method should also be O(1). But, your getEdge() may be O(n). This is a fine tradeoff for this problem as you will call neighbors() and addEdge() much more frequently than you'll call getEdge().
Shortest Path Search
A large part of this assignment will be implementing a Bread First
Search of the graph. This will require you to complete the
shortestPath
method in the Graph.java file. This method returns a
LinkedList, where if we were to treat that list as a stack,
successive pop()
's from the list will reveal the path from v1
to v2
. So, you should push onto this list in reverse order of the
path, from v2
to v1
.
The Movie Files and Command Line Arguments
The movie_cast-20000.list
and movie_cast-100000.list
are tab
separated text file. Each line of the file represents a movie, the
first item on the line, followed by a list of cast members.
You should note that there are a limited number of movies in these files to help facilitate your testing. You may find that some obvious paths don't exist in the shorter 20,000 movie file, for example, there is no path between Kevin Bacon and Brad Pitt. However, in the larger movie file list of 100,000 movies, such a path does exist.
$ java OracleOfBacon movie_cast-100000.list Reading in data ... Search a name (^D to exit): pitt Choose from the following (0) PITSILOS, NOTIS (1) PITSIOS, KOSTAS (2) PITSIS, TED (3) PITSKHELAURI, GEORGIY (4) PITSOULI, PENELOPE (5) PITT, JULIANNA (II) (6) PITT, JORDAN (7) PITT, INGRID (8) PITT, CHRIS (I) (9) PITT, BRAD Choice: 9 Finding Shortest Path PITT, BRAD | | BEYOND ALL BOUNDARIES (2009) V BACON, KEVIN (I)
Note that to use a different movie list, you specify it as a command line argument. Clearly, it may take longer to load the larger movie file, so only use it once you're confident in the performance of your graph implementation. You should also note that you may have slightly different paths between actors when using the larger data file, but they should get strictly shorter as the 20,000 movies are subset of the 100,000.