CS 310 HW 4: TripleStore Database
- Due: 11:59pm Friday 12/9/2016
- Approximately 8.75% of total grade
- Submit to Blackboard
- There is no code distribution for this assignment (notes)
- There are no milestones for this project
- Final Test Cases: Available by Wed 11-30-2016
CHANGELOG:
- Tue Nov 29 17:21:58 EST 2016
- Some of the examples of
query()
results indicated that anArrayList
is returned. This should just be a generalList
which may be either anArrayList
orLinkedList
.In response to a @805 a section outlining how wild fields compare has been has been added.
Table of Contents
1 Overview and Goals
1.1 TripleStore Database
Databases pervade the computing world as they provide an efficient way to store information that can be accessed and manipulated in a flexible fashion. Most database systems provide a way to create a table which has many entries (usually written as the rows) and many columns (the different fields of each row). Databases may contain multiple different tables with varying table sizes. This assignment centers around implementing a very simple database in Java called a triplestore, so named because it has a single table with many rows but only three columns ever. For the purpose of this assignment, these columns are named Entity, Relation, and Property. Two examples of triplestores are shown in tabular format below. The first triplestore is small and somewhat trivial while the second has immediate commercial applications
1.2 TripleStore Examples
Comical Example
entity | relation | property | |
---|---|---|---|
1 | Willie | ISA | human |
2 | Alf | ISA | alien |
5 | Alf | EATS | cat |
3 | Lynn | ISA | human |
4 | Lucky | ISA | cat |
6 | Lynn | EATS | veggies |
7 | Lucky | LIKESTO | purr |
8 | Lucky | EATS | catfood |
9 | Alf | EATS | veggies |
Commercial Example
entity | relation | property | |
---|---|---|---|
1 | 1 | ISA | boots |
2 | 1 | AVAILABLE | 4 |
3 | 1 | COSTS | 32.50 |
4 | 1 | SIZE | 10 |
5 | 2 | ISA | boots |
6 | 2 | AVAILABLE | 6 |
7 | 2 | COSTS | 32.50 |
8 | 2 | SIZE | 9 |
9 | 3 | ISA | shirt |
10 | 3 | COLOR | red |
11 | 3 | COSTS | 11.99 |
12 | 3 | SIZE | XL |
13 | 3 | AVAILABLE | 2 |
14 | 3 | MATERIAL | cotton |
15 | 3 | DESIGNER | Swankums |
16 | 4 | ISA | watch |
17 | 4 | ISA | accessory |
18 | 4 | AVAILABLE | 1 |
19 | 4 | COSTS | 302.29 |
20 | 4 | DESIGNER | PricePlusPlus |
21 | 4 | MATERIAL | gold |
22 | 5 | ISA | hat |
23 | 5 | ISA | accessory |
24 | 5 | MATERIAL | wool |
25 | 5 | COSTS | $29.99 |
26 | 5 | COLOR | gray |
27 | 5 | AVAILABLE | 0 |
28 | 5 | ISA | warm clothing |
29 | 5 | DESIGNER | Swankums |
1.3 TripleStore
Demo
The quickest way to get an immediate sense of how a TripleStore
looks and feels is to examine the following demo done in DrJava's
interaction pane. jGrasp provides the same functionality for
experimentation.
In the demo, most results of assignments or object returns are shown on the following line by NOT putting a semicolon at the end of the line. This is the default behavior in DrJava while in jGrasp you must type a variable name on its own line to see its printed representation. Comments are also inserted to add some guidance to the demo.
Welcome to DrJava. > TripleStore t = new TripleStore(); > t > // Empty TripleStore > // --- ADD --- > boolean b = t.add("Willie","ISA","human"); > b true > // Successful add returns true > t.add("Alf","ISA","alien"); > t Alf ISA alien Willie ISA human > // Two Records in the TripleStore > b = t.add("Willie","ISA","human"); > b false > // Duplicates are not allowed > t Alf ISA alien Willie ISA human > // Still only two Records in the TripleStore > t.add("Lynn","ISA","human"); > t.add("Lucky","ISA","cat"); > t.add("Alf","EATS","cat"); > t.add("Lynn","EATS","veggies"); > t.add("Lucky","LIKESTO","purr"); > t.add("Lucky","EATS","catfood"); p> t.add("Alf","EATS","veggies"); > t Alf EATS cat Alf EATS veggies Alf ISA alien Lucky EATS catfood Lucky ISA cat Lucky LIKESTO purr Lynn EATS veggies Lynn ISA human Willie ISA human > // Variety of things in the TripleStore > // --- QUERY --- > import java.util.*; > List<Record> results; > results = t.query("Alf","ISA","alien"); > results [ Alf ISA alien] > // query returns an List > // Records match exactly, 1 result > t.getWild() "*" > results = t.query("Alf","ISA","*") [ Alf ISA alien] > // query with a wild card matched 1 record > results = t.query("Alf","EATS","*") [ Alf EATS cat, Alf EATS veggies] > // query with a wild card matched 2 records > results = t.query("Alf","*","*") [ Alf EATS cat, Alf EATS veggies, Alf ISA alien] > // query with several wild cards matched 3 records > t.query("*","ISA","human") [ Lynn ISA human, Willie ISA human] > t.query("*","*","human") [ Lynn ISA human, Willie ISA human] > t.query("*","*","cat") [ Alf EATS cat, Lucky ISA cat] > t.query("*","ISA","*") [ Alf ISA alien, Lucky ISA cat, Lynn ISA human, Willie ISA human] > // wildcards can appear in any of entity,relation,property > // --- REMOVE --- > int nrm = t.remove("Alf","EATS","veggies") 1 > // successful removal, changes database > t Alf EATS cat Alf ISA alien Lucky EATS catfood Lucky ISA cat Lucky LIKESTO purr Lynn EATS veggies Lynn ISA human Willie ISA human > nrm = t.remove("Alf","EATS","fruit") 0 > // unsuccessful remove > nrm = t.remove("Alf","EATS","veggies") 0 > // record no longer exists > nrm = t.remove("Alf","*","*") 2 > t Lucky EATS catfood Lucky ISA cat Lucky LIKESTO purr Lynn EATS veggies Lynn ISA human Willie ISA human > t.remove("*","*","*") 6 > t > // wild cards can remove many records > --- WILDCARDS --- > t.getWild() "*" > // Can add wild card strings as records > t.add("Alf","ISA","*") > t Alf ISA * > t.add("Alf","ISA","whateva") true > t.add("Alf","ISA","alien") true > t Alf ISA * Alf ISA alien Alf ISA whateva > t.query("Alf","ISA","*") [Alf ISA * , Alf ISA alien , Alf ISA whateva ] > t.setWild("whateva") > // The * is no longer the wild card, matches only equal strings > t.query("Alf","ISA","*") [Alf ISA * ] > // String "whateva" is not the wild card, matches anything > t.query("Alf","ISA","whateva") [Alf ISA * , Alf ISA alien , Alf ISA whateva ] > t.setWild("whateva") > t.getWild() "whateva" > t.query("Alf","ISA",t.getWild()) [Alf ISA * , Alf ISA alien , Alf ISA whateva ]
1.4 Basic operations
A TripleStore is like many other data structures in that it provides
add, remove, and find functionality. Similar to the binary search
trees discussed in class, no duplication is allowed in a TripleStore
so add()
will not add a record that is already present. Slightly
more general than other data structures is that the find
functionality, named query()
, may return multiple results in a
collection and the remove functionality may remove more than one
record from the triplestore. The detailed functionality of each
method is described in the implementation sections in the section on
TripleStore
.
1.5 Records
The records stored in the TripleStore
will be objects of class
Record
. At its core, a Record
is just a way cart around the
String
objects entity,property,relation
in a triplet as shown in
the tables above. Records
are immutable in that, once they are
created, they cannot change, in much the same way that when a String
is created, it cannot be changed. This sacrifices some space and time
efficiency for much simpler reasoning about data structures
involving Record
. The fields of a Record
are accessible by
methods named after the fields.
public String entity(); // Who public String relation(); // How public String property(); // What
In addition to these, every Record
will have a unique identifier
which is returned by the id()
method. Whenever a Record
is
created, a new unique ID number is assigned to it which is accessible
through the id()
method.
public int id(); // Must be unique
1.6 Fast Access and Wildcards
Each triplestore keeps track of a wild card string which allows
flexible queries and removals that can handle multiple matching
records. The default wild card is *
(star) but it may be changed to
any string using methods described below. A query involving a
wildcard may match more than one record . Providing fast location of
all Records
that match a query involving wildcards is the primary
focus of this assignment.
There are several options for arranging Records
in memory, but
providing a reasonably efficient combination of add,query,remove
operations, \(O(\log N)\) suggests the use of balanced binary search
trees.
In addition to this, query()
calls which use wild cards may return
multiple records. The sorted nature of binary search trees will come
in handy here. For example, consider a binary tree which sorts each
stored Record
starting with the String entity
field breaking ties
by examining relation
and breaking further ties with property
. An
example is shown below which does this in jGrasp and uses an pre-built
red-black tree in the java library. Note the color of nodes is shown
accurately as red or black.
Figure 1: Red-Black Tree with Entity-Relation-Property Sorting
If t.query("Alf","*","*")
is executed, an effective strategy is to
find the "least" node involving Alf
and begin an in-order traversal
from that point. This would lead to the sequence of Records
Alf EATS cat; Alf EATS veggies; Alf ISA alien; Lucky EATS catfood; ...
which contains relevant records to the query. During the traversal,
if each record is checked to determine if it matches the query, the
first three clearly match while the fourth involving Lucky
does not
match. Since the tree is sorted, there can be no more Alf
records
and the method can stop the traversal and return the three records
found that match the query.
This process involves the following steps.
- Find the smallest matching record
- Start an in-order traversal at that record
- For each record in the traversal
- If the record matches the query, add it to the growing collection
- Otherwise there can be no more matching records so stop and return
Step 1 can be done in \(O(\log N)\) time as one simply needs to search our tree for the "spot" where the query would belong. Step 2 is \(O(1)\) if the tree has parent pointers or some facility for supporting in-order traversals of the tree efficiently (which is the case for production quality tree implementations). Finally, Step 3 visits each matching node plus one additional node that indicates there will be no other matches. If there are \(K\) matches, Step 3 takes \(O(K)\) time. Thus, the total complexity is \(O(K + \log N)\).
Unfortunately, the above tree will not help us with a query such as
t.query("*","ISA","*")
as the tree is not sorted appropriately: the
matching ISA
records are spread throughout the tree. A simple
solution to this is to use additional trees storing the same data
but sorted differently. For example, the below tree stores the same
data but sorted on relation
, then property
, then entity
.
Figure 2: Red-Black Tree with Relation-Property-Entity Sorting
The records matching t.query("*","ISA","*")
are all stored linked
together starting with the root.
A data structure that facilitates fast lookup into a database is
usually referred to as an index. When a user calls t.query()
it
is up to the TripleStore
class to analyze the query to determine the
most efficient means to find matching records, usually by selecting an
index if one is available. All queries to triplestore should be
efficient: regardless of the query query()
should meet the target
\(O(K + \log N)\) complexity target. This will require several indexes.
The trade-off of this fast lookup is that when records are added or
removed, all indexes must be updated. The cost of add
will be
\(O(\log N)\) as it deals with only a single record. The cost of
remove
is \(O(K \times \log N)\) as multiple records may need to be
removed. There is also a space trade-off as the stored records
require more data structures to facilitate fast lookup which means
more memory is used.
2 Class Architecture
2.1 Project Files
The following files are relevant to the HW. When submitting, place
them in your HW Directory, zip that directory and submit. Always
submit a zip file not tar.gz
, not bzip, not 7zip, just vanilla zip
files. For additional instructions see Setup and Submission.
File | State | Notes |
---|---|---|
Record.java | Create | Triples stored in the TripleStore database. May subclass to support queries. |
TripleStore.java | Create | Database with 3 columns. Stores records via add() and allows find() and remove(). |
junit-c310.jar | Testing | JUnit library for testing. Copy over from previous projects. |
ID.txt | Create | Identifying information |
2.2 Built-in Classes of Interest
It is strongly recommended that you not try to implement balanced trees yourself. Aside from the obvious difficulty, this is not an analogous activity to what programming in the wild is like. Instead, consider the use of the following classes
-
java.util.TreeSet
- Implemented using a red-black tree under the
hood, this is an incredibly useful class to become familiar with.
While
TreeSet
is fairly complex, it provides nearly all the functionality required for this assignment without even the need toextend
it. Pay particular attention to methods which provide a view of the tree as aSortedSet
such astailSet()
which essentially give a "view" of a subset of the tree; an iterator can be obtained from this subset which will traverse the tree in order. The standardadd
andremove
methods are also present for theTreeSet
. -
java.util.Comparator
- An interface that allows one to create
objects which compare other objects. Any binary search tree will
need a means of comparing objects and if the same objects are to
be sorted in multiple ways,
Comparators
are the standard way to do this. Note thatTreeSet
has a constructor which takes aComparator
as an argument: any insertions or lookups use the given comparator as a way to navigate the sorted tree.
Mastering these two classes will involve reading their documentation carefully and experimenting in interactive loops or with your own compiled code. This is typically the case when using other people's code and is what developers in the wild do far more than actually writing their own code. It is worth the effort to make the project easier and to develop reading/understanding skills so that you don't re-invent the tree.
Note: You should not need to examine the the source code for TreeSet to complete the assignment but the curious and ambitious wizard may find it enlightening.
2.3 Constraints
The only code you may use for this project falls into two categories
- Classes in the Java standard library such as
TreeSet
- Classes and code you write yourself
There may be some triplestore java implementations out there but you are not to use them for this project; such action would constitute an honor code violation.
2.4 Record.java
// Immutable. Stores 3 strings referred to as entity, relation, and // property. Each Record has a unique integer ID which is set on // creation. All records are made through the factory method // Record.makeRecord(e,r,p). Record which have some fields wild are // created using Record.makeQuery(wild,e,r,p) public class Record{ // Return the next ID that will be assigned to a Record on a call to // makeRecord() or makeQuery() public static int nextId(); // Return a stringy representation of the record. Each string should // be RIGHT justified in a field of 8 characters with whitespace // padding the left. Java's String.format() is useful for padding // on the left. public String toString(); // Return true if this Record matches the parameter record r and // false otherwise. Two records match if all their fields match. // Two fields match if the fields are identical or at least one of // the fields is wild. public boolean matches(Record r); // Return this record's ID public int id() ; // Accessor methods to access the 3 main fields of the record: // entity, relation, and property. public String entity(); public String relation(); public String property(); // Returns true/false based on whether the the three fields are // fixed or wild. public boolean entityWild(); public boolean relationWild(); public boolean propertyWild(); // Factory method to create a Record. No public constructor is // required. public static Record makeRecord(String entity, String relation, String property); // Create a record that has some fields wild. Any field that is // equal to the first argument wild will be a wild card public static Record makeQuery(String wild, String entity, String relation, String property); // Comparators that compare Records based on different orderings of // their fields. The names of the Comparators correspond to the // order in which they compare fields: ERPCompare compares Entity // (E), then Relation (R), then property (P). Likewise for // RPECompare and PER compare. public static final Comparator<Record> ERPCompare; public static final Comparator<Record> RPECompare; public static final Comparator<Record> PERCompare; }
2.5 TripleStore.java
// Three-column database that supports query, add, and remove in // logarithmic time. public class TripleStore{ // Create an empty TripleStore. Initializes storage trees public TripleStore(); // Access the current wild card string for this TripleStore which // may be used to match multiple records during a query() or // remove() calll public String getWild(); // Set the current wild card string for this TripleStore public void setWild(String w); // Ensure that a record is present in the TripleStore by adding it // if necessary. Returns true if the addition is made, false if the // Record was not added because it was a duplicate of an existing // entry. A Record with any fields may be added to the TripleStore // including a Record with fields that are equal to the // TripleStore's current wild card. Throws an // IllegalArgumentException if any argument is null. // // Target Complexity: O(log N) // N: number of records in the TripleStore public boolean add(String entity, String relation, String property); // Return a List of the Records that match the given query. If no // Records match, the returned list should be empty. If a String // matching the TripleStore's current wild card is used for one of // the fields of the query, multiple Records may be returned in the // match. An appropriate tree must be selected and searched // correctly in order to meet the target complexity. Throws an // IllegalArgumentException if any argument is null. // // TARGET COMPLEXITY: O(K + log N) // K: the number of matching records // N: the number of records in the triplestore. public List<Record> query(String entity, String relation, String property); // Remove elements from the TripleStore that match the parameter // query. If no Records match, no Records are removed. Any of the // fields given may be the TripleStore's current wild card which may // lead to multiple Records bein matched and removed. Return the // number of records that are removed from the TripleStore. Throws // an IllegalArgumentException if any argument is null. // // TARGET COMPLEXITY: O(K * log N) // K: the number of matching records // N: the number of records in the triplestore. public int remove(String e, String r, String p); // Produce a String representation of the TripleStore. Each Record // is formatted with its toString() method on its own line. Records // must be shown sorted by Entity, Relation, Property in the // returned String. // // TARGET COMPLEXITY: O(N) // N: the number of records stored in the TripleStore public String toString(); }
3 Record Class Implementation
public class Record
Databases contain "rows" and in the case of TripleStore
, each row
comprises three fields which will be housed in instances of the Record
class. It is an immutable class which carts around three strings. The
class provides facilities for unique creation and creation of records
which act as queries that can match other records. It also provides a
set of comparators for the arrangement of Records
in various ways in
the TripleStore
.
3.1 Creation and Basic Functionality
Fields and accessors
public int id(); // Must be unique public String entity(); // Who public String relation(); // How public String property(); // What
public static Record makeRecord(String e, String r, String p)
Notes
- You are free to specify your own constructors but testing code
always uses the factory method
Record.makeRecord(..)
which should return a record. - If any of the arguments
e,r,p
arenull
, throw anIllegalArgumentException
with an informative message. - The ID number of a
Record
is accessible only through theid()
method and never changes after creation. - The
id()
method must return an integer unique to every record but is not set by the user. A class level (static
) private field is usually used for this kind of behavior. Example:> Record r1 = Record.makeRecord("Alf","ISA","alien"); > Record r2 = Record.makeRecord("Alf","ISA","alien"); > r1.id() 45 > r2.id() 46 > r1.id() == r2.id() false
3.2 Record.toString()
Note: the demos may not display records in exactly the right format. This section contains the expected format.
public String toString()
A specific format is required for Record.toString()
.
- Each field is separated by 1 space.
entity
appears first and is right justified in a field of 8 charactersrelation
appears second and is right justified in a field of 8 charactersproperty
appears third and is right justified in a field of 8 characters- Use of
String.format("%8s ")
or something similar is encouraged to create the formatted record string - If any field exceeds 8 characters, take no special action acknowledging that this may cause tables of records to display undesirably.
id
does not apper intoString()
- No special action needs to be taken if the fields of the record are wider than 8 characters: this may lead to ugly printing but fixing this problem is beyond the scope of project.
You will likely find the function String.format()
useful for
constructing building string representations.
Several records formatted as strings are below.
> Record r; > r = Record.makeRecord("A","B","C"); > r A B C > r.toString() " A B C " > r = Record.makeRecord("Alf","ISA","alien"); > r Alf ISA alien > r.toString() " Alf ISA alien " > r = Record.makeRecord("12345678","12345678","12345678"); > r 12345678 12345678 12345678 > r.toString() "12345678 12345678 12345678 " // Take no special action when the fields are longer than 8 characters > r = Record.makeRecord("1234567890","123456789","123456789012"); > r 1234567890 123456789 123456789012 > r.toString() "1234567890 123456789 123456789012 "
3.3 Records
with Wild Cards
public static Record makeQuery(String wild, String e, String r, String p)
This factory method returns a special kind of record, perhaps a
subclass of Record
which is able to match other records. If any of
e,r,p
equal the first argument wild
, those fields should be marked
as "wild" and can match any other string in Record.matches()
.
public boolean entityWild() public boolean relationWild() public boolean propertyWild()
Each record has several simple accessors which determine if its fields
are wild. A field is wild only if it exactly matches the wild card
in Record
according to the String.equals()
method.
// Records created with makeRecord(..) never have wild fields > Record notAquery = Record.makeRecord("Alf","*","*"); > notAquery.entityWild() false > notAquery.relationWild() false > notAquery.propertyWild() false // Queries created with makeQuery(..) have wild fields where they // match the first argument string > Record query = Record.makeQuery("*","Alf","*","*"); > query Alf * * > query.entityWild() false > query.relationWild() true > query.propertyWild() true // Any string can denote wild records; > Record query2 = Record.makeQuery("wild","X","wild","*"); > query2 X wild * > query2.entityWild() false > query2.relationWild() true > query2.propertyWild() false // despite being *, third field is not wild as the word 'wild' was // chosen to denote wild fields in construction of query2
3.4 Record.matches()
public boolean matches(Record r)
- Determine if two
Records
match; returntrue
if they do andfalse
otherwise. - Matching occurs when all the fields
entity,relation,property
match - Fields match if either one or both is wild or both fields are
exactly the same according to
String.equals()
- Examples
> Record r1 = Record.makeRecord("Alf","ISA","alien"); > Record r2 = Record.makeRecord("Alf","ISA","alien"); > r1.matches(r2) true > Record r3 = Record.makeRecord("Alf","EATS","cat"); > r1.matches(r3) false > Record r4 = Record.makeQuery("*","Alf","ISA","*") Alf ISA * > r1.matches(r4) true > > Record r5 = Record.makeQuery("*","Alf","*","*"); > r1.matches(r5) true > r4 Alf ISA * > r5 Alf * * > r4.matches(r5) true
3.5 Record
Comparators
public static final Comparator<Record> ERPCompare public static final Comparator<Record> RPECompare public static final Comparator<Record> PERCompare
The purpose of these three objects is to allow Records
to be
arranged in different ways in a binary search tree. Each implement the
Comparator<Record>
interface so that they have the following method.
public int compare(Record r1, Record r2)
Each of the comparators compares fields of Records
in a different
order to determine which Record
is sorted first. The orders are
ERPCompare: entity relation property
RPECompare: relation property entity
PERCompare: property entity relation
Comparison starts with the first field listed. If they are equal, the
next field listed is compared to break the tie, and if they are equal
the final field is considered. The records are equal only if all
of entity,relation,property
match exactly.
For example given the records
Record abc = Record.makeRecord("a","b","c"); Record bca = Record.makeRecord("b","c","a"); Record cba = Record.makeRecord("c","b","a"); Record caa = Record.makeRecord("c","a","a");
The following results apply
Comparison | entity | relation | property | ERPCompare | RPECompare | PERCompare |
---|---|---|---|---|---|---|
compare(abc,bca) | abc < bca | abc < bca | abc > bca | < 0 | < 0 | > 0 |
compare(abc,cba) | abc < cba | abc = cba | abc > cba | < 0 | > 0 | > 0 |
compare(cba,caa) | cba = caa | cba > caa | cba = caa | > 0 | > 0 | > 0 |
It is very important that these comparators work correctly as they determine how trees which use them will be sorted. Make sure to test thoroughly.
Some additional examples are given below.
> Record abc = Record.makeRecord("a","b","c"); > Record abb = Record.makeRecord("a","b","b"); > Record bca = Record.makeRecord("b","c","a"); > Record cba = Record.makeRecord("c","b","a"); > Record.ERPCompare.compare(abc,abb) 1 > Record.ERPCompare.compare(abc,abc) 0 > Record.ERPCompare.compare(abc,abb) 1 > Record.RPECompare.compare(abc,abc) 0 > Record.RPECompare.compare(abc,abb) 1 > Record.RPECompare.compare(abc,bca) -1 > Record.PERCompare.compare(abc,abc) 0 > Record.PERCompare.compare(abc,abb) 1 > Record.PERCompare.compare(abc,bca) 2 > Record.ERPCompare.compare(abc,cba) -2 // +2 and -2 are not errors. // Results need not be restricted to -1,0,+1 // so long as they abide by <0, ==0, >0 // for the proper cases. String.compareTo // may be directly used in the comparators
3.6 Wildcards and Comparisons
When implementing query
, one typically wants to find the "first"
record that matches a query with a wildcard. This can be done by
iterating through the tree from the beginning but such an approach
would not meet the target complexity.
Instead, production trees like TreeSet
provide facilities to obtain
sorted views of the tree starting at different elements, for instance
starting at the first stored element bigger than a given element. One
can use this facility along with properly implemented comparators to
accomplish the fast query for this project.
If a comparator treats wildcards as less than any other string then
asking a properly sorted tree for the first entry bigger than the
query will yield the first matching Record
if one exists. One can
then start an in-order traversal to find any other matching Records
for the query results. This means the comparators need to be aware of
the wildcard in Record
and use it during comparisons.
Consider the tree sorted on entity,relation,property
and the call
t.query("Lucky","*","*")
.
Figure 3: Red-Black Tree with Entity-Relation-Property Sorting
If the wildcards in ("Lucky","*","*")
are considered less than the
strings in the relation
and property
fields of the existing
records involving Lucky
are
Lucky | EATS | catfood |
Lucky | ISA | cat |
Lucky | LIKESTO | purr |
then the first record "bigger" than the query according to the tree
diagram will be Lucky EATS catfood
and an in-order traversal from
there will find all Lucky
records. The query Lucky * *
would fit
"in between" Alf ISA alien
and Lucky EATS catfood
in the tree
above if it were allowed to be added (which it is not).
The following examples illustrate the intention for wildcards in comparisons.
> Record r1 = Record.makeRecord("Lucky","ISA","cat") > r1 Lucky ISA cat > Record r2 = Record.makeRecord("Lucky","LIKESTO","purr") r2 Lucky LIKESTO purr > Record q = Record.makeQuery("*","Lucky","*","*") > q Lucky * * > Record.ERPCompare.compare(r1, r2) -3 > Record.ERPCompare.compare(q, r1) -1 > Record.ERPCompare.compare(q, r2) -1 > Record.PERCompare.compare(r1, r2) -13 > Record.PERCompare.compare(q, r1) -1 > Record.PERCompare.compare(q, r2) -1 > Record r3 = Record.makeRecord("Alf","ISA","alien") > r3 Alf ISA alien > Record.ERPCompare.compare(q, r3) 11 > Record r4 = Record.makeRecord("Lucky","EATS","catfood") > r4 Lucky EATS catfood > Record.ERPCompare.compare(q, r4) -1
Though the situation should never arise in its use of with Triplestore, two wild fields are considered equal to one another regardless of what their underlying string might be.
// All wild fields > Record q = Record.makeQuery("*","*","*","*") > Record v = Record.makeQuery("*","*","*","*") > Record.ERPCompare.compare(q,v) 0 // Same wilds except for cat > v = Record.makeQuery("*","*","*","cat") > Record.ERPCompare.compare(q,v) -1 > Record.ERPCompare.compare(v,q) 1 // Two records with wild fields but different wild strings are // equal to one another > q = Record.makeQuery("*","*","*","*") > v = Record.makeQuery("wild","wild","wild","wild") > Record.ERPCompare.compare(v,q) 0 > q = Record.makeQuery("*","*","*","cat") > v = Record.makeQuery("wild","wild","wild","cat") > Record.ERPCompare.compare(v,q) 0
Note: The Comparator
interface specifies in addition to the
compare()
method that Comparators
should have an equals(Object
o)
method. All classes inherit equals()
from Object
and for this
project, there is no need to over-ride the default equals()
implementation, only that an implementation of compare(r1,r2)
is
provided.
4 TripleStore Class Implementation
public class TripleStore
Triplestores are databases that have three columns (each row has three
fields). Our implementation will enforce that each record in the
database is unique. The operations supported are insertion,
lookup/query, and removal based on a query. This implementation will
support logarithmic time operations for all three operations. To
achieve this performance, the database must store its Records
in
three separate balanced binary search trees. A good choice for the
BST is java.util.TreeSet
which is implemented as a Red-Black
tree. TreeSet
provides a variety of useful methods that you should
review via its javadocs.
4.1 TripleStore
Constructor and Basic methods
public TripleStore()
Create a TripleStore
. Initialize any trees you will use in the
constructor. Make sure to pass in relevant arguments to the tree
constructors such as Comparators
.
In addition, specify a few simple methods to get and set the wild card for a triplestore.
public String getWild() public void setWild(String w)
- These methods allow the current wild card to be inspected and changed.
- The default wild card is required to be the asterisk string
*
- The following example illustrates the independence of wild cards between separate triplestore instances.
TripleStore t1 = new TripleStore(); t1.add("Willie","ISA","human"); TripleStore t2 = new TripleStore(); t2.add("Willie","ISA","human"); t2.setWild("@"); t1.query("Willie","*","*"); // [ Willie ISA human] t2.query("Willie","@","@"); // [ Willie ISA human] t1.query("Willie","@","@"); // [] t2.query("Willie","*","*"); // []
4.2 TripleStore.toString()
// TARGET COMPLEXITY: O(N) // N: the number of records stored in the TripleStore
Create a string representation of the TripleStore
suitable for
printing. This should be done by building a String
containing the
results of each Record.toString()
which is currently stored and
separating these with a newline. Proper implementation of
Record.toString()
is essential for TripleStore.toString()
.
Notes
- Records MUST appear in sorted order based on a correct
implementation of
ERPComparator
. SeeRecord
Comparators for details on this order. Warning: Some of the demo displays may not show records exactly in this ordering as the demos were generated using a prototype but it is requirement for your implementation. - Conversion to a string should not repeatedly call
query
but instead directly walk through the underlying data likely via a traversal of one of the index trees. - To meet the given complexity bound, you will need to find an
efficient way to concatenate
Strings
. The typical use of+
is NOT efficient and will result in a \(O(N^2)\) performance. Your textbook contains information on efficiently buildingStrings
up.
4.3 TripleStore.add()
// Target Complexity: O(log N) // N: number of records in the TripleStore
Add a single record into the TripleStore
.
- Use the three argument
Strings
to create a validRecord
- It is not an error to add a Record with the current wild card
- If the
TripleStore
already contains an identical record, do not add the new duplicate information and returnfalse
- Add the Record and ensure that any and all index trees are updated.
Examples from 1.3
> TripleStore t = new TripleStore(); > t > // Empty TripleStore > // --- ADD --- > boolean b = t.add("Willie","ISA","human"); > b true > // Successful add returns true > t.add("Alf","ISA","alien"); > t Alf ISA alien Willie ISA human > // Two Records in the TripleStore > b = t.add("Willie","ISA","human"); > b false > // Duplicates are not allowed > t Alf ISA alien Willie ISA human > // Still only two Records in the TripleStore > t.add("Lynn","ISA","human"); > t.add("Lucky","ISA","cat"); > t.add("Alf","EATS","cat"); > t.add("Lynn","EATS","veggies"); > t.add("Lucky","LIKESTO","purr"); > t.add("Lucky","EATS","catfood"); > t.add("Alf","EATS","veggies"); > t Alf EATS cat Alf EATS veggies Alf ISA alien Lucky EATS catfood Lucky ISA cat Lucky LIKESTO purr Lynn EATS veggies Lynn ISA human Willie ISA human
4.4 TripleStore.query()
// TARGET COMPLEXITY: O(K + log N) // K: the number of matching records // N: the number of records in the triplestore. public List<Record> query(String entity, String relation, String property)
Return a List
of the records that match the given query parameters.
Any of the arguments entity,relation,property
may be the wild card
for the Triplestore.
- To meet the complexity bound, you will need to identify which index to use for the initial search into the tree.
- Use methods of your tree (probably
TreeSet
) to get a view of the tree starting at a certain key and which allow the tree to be traversed in order. This will allow you to meet the target complexity. - You'll likely need to use
Record.makeQuery(..)
to create a record with the wild fields matching the wild card associated with the triplestore.
Examples from TripleStore
Demo
> // --- QUERY --- > import java.util.*; > List<Record> results; > results = t.query("Alf","ISA","alien"); > results [ Alf ISA alien] > // query returns an List > // Records match exactly, 1 result > t.getWild() "*" > results = t.query("Alf","ISA","*") [ Alf ISA alien] > // query with a wild card matched 1 record > results = t.query("Alf","EATS","*") [ Alf EATS cat, Alf EATS veggies] > // query with a wild card matched 2 records > results = t.query("Alf","*","*") [ Alf EATS cat, Alf EATS veggies, Alf ISA alien] > // query with several wild cards matched 3 records > t.query("*","ISA","human") [ Lynn ISA human, Willie ISA human] > t.query("*","*","human") [ Lynn ISA human, Willie ISA human] > t.query("*","*","cat") [ Alf EATS cat, Lucky ISA cat] > t.query("*","ISA","*") [ Alf ISA alien, Lucky ISA cat, Lynn ISA human, Willie ISA human] > // wildcards can appear in any of entity,relation,property
4.5 TripleStore.remove()
// TARGET COMPLEXITY: O(K * log N) // K: the number of matching records // N: the number of records in the triplestore. public int remove(String e, String r, String p){
Note: remove()
does not need to be as fast as query()
: compare
their target complexities carefully
query()
: \(O(K + \log{N})\)remove()
: \(O(K \times \log{N})\)
Remove records from the TripleStore
. Any of the arguments
entity,relation,property
may be wild. All records that match the
parameters will be removed.
Return the number of elements removed which may be 0 if no records matched the parameters.
Examples from TripleStore
Demo
> // --- REMOVE --- > t Alf EATS cat Alf EATS veggies Alf ISA alien Lucky EATS catfood Lucky ISA cat Lucky LIKESTO purr Lynn EATS veggies Lynn ISA human Willie ISA human > int nrm = t.remove("Alf","EATS","veggies") 1 > // successful removal, changes database > t Alf EATS cat Alf ISA alien Lucky EATS catfood Lucky ISA cat Lucky LIKESTO purr Lynn EATS veggies Lynn ISA human Willie ISA human > nrm = t.remove("Alf","EATS","fruit") 0 > // unsuccessful remove > nrm = t.remove("Alf","EATS","veggies") 0 > // record no longer exists > nrm = t.remove("Alf","*","*") 2 > t Lucky EATS catfood Lucky ISA cat Lucky LIKESTO purr Lynn EATS veggies Lynn ISA human Willie ISA human > t.remove("*","*","*") 6 > t > // wild cards can remove many records
5 Grading
Grading for this HW will be divided into three distinct parts:
- Part of your grade will be based on passing some automated test cases by an early "milestone" deadline. See the top of the HW specification for
- Part of your grade will be based on passing all automated test cases by the final deadline
- Part of your grad will be based on a manual inspection of your code and analysis documents by the teaching staff to determine quality and efficiency.
5.1 Final Automated Tests (50%)
- JUnit test cases will be provided to detect errors in your code. These will be run by a grader on submitted HW after the final deadline.
- Tests may not be available on initial release of the HW but will be posted at a later time.
- Tests may be expanded as the HW deadline approaches.
- It is your responsibility to get and use the freshest set of tests available.
- Tests will be provided in source form so that you will know what tests are doing and where you are failing.
- It is up to you to run the tests to determine whether you are passing or not. If your code fails to compile against the tests, little credit will be garnered for this section
- Most of the credit will be divide evenly among the tests; e.g. 50% / 25 tests = 2% per test. However, the teaching staff reserves the right to adjust the weight of test cases after the fact if deemed necessary.
- Test cases are typically run from the command line using the
following invocation which you should verify works as expected on
your own code.
UNIX Command line instructions Compile > javac -cp .:junit-cs310.jar *.java Run tests > java -cp .:junit-cs310.jar SomeTests WINDOWS Command line instructions: replace colon with semicolon Compile > javac -cp .;junit-cs310.jar *.java Run tests > java -cp .;junit-cs310.jar SomeTests
5.2 Final Manual Inspection (50%)
- Graders will manually inspect your code and analysis documents looking for a specific set of features after the final deadline.
- Most of the time the requirements for credit will be posted along with the assignment though these may be revised as the the HW deadline approaches.
- Credit will be awarded for good coding style which includes
- Good indentation and curly brace placement
- Comments describing private internal fields
- Comments describing a complex section of code and invariants which must be maintained for classes
- Use of internal private methods to decompose the problem beyond what is required in the spec
- Some credit will be awarded for clearly adhering to the target
complexity bounds specified in certain methods. If the specification
lists the target complexity of a method as O(N) but your
implementation is actually O(N log N), credit will be deducted. If
the implementation complexity is too difficult to determine due to
poor coding style, credit will likely be deducted.
All TARGET COMPLEXITIES are worst-case run-times.
- Some credit will be awarded for turning in any analysis documents that are required by the HW specification. These typically involve analyzing how fast a method should run or how much memory a method requires and are reported in a text document submitted with your code.
6 Final Manual Inspection Criteria
6.1 (10%) Record
Design
- The design of the
Record
class is documented via comments on class fields. - It is clear how the notion of wild cards is supported.
- It is clear that any string can be used to create a query with wild
fields and that there are no external dependencies of wildness on
other classes such as
TripleStore
. - There is adequate documentation of the
matches(..)
method so that it is easy to understand how it operates.
6.2 (10%) Record
Comparator Design and Use
- The static fields housing the comparators are present and usable
- It is clear how each comparator implements
compare(..)
differently to sortRecords
in alternative orders. Some comments and clear code are present achieve the different sorting orders. - When creating trees that store
Records
in different orders,TripleStore
adheres to the public interface ofRecord
laid out in the HW specification.TripleStore
should not use any classes that are internal toRecord
except via public fields mentioned in the HW specification.
6.3 (20%) TripleStore
Method Runtime Complexities
// Target Complexity: O(log N) // N: number of records in the TripleStore public boolean add(String entity, String relation, String property)
- It is clear that no duplication of records occurs during add.
- Data is added to all internal trees to support fast queries later.
- The overall method clearly meets the target runtime complexity.
// TARGET COMPLEXITY: O(K + log N) // K: the number of matching records // N: the number of records in the triplestore. public List<Record> query(String entity, String relation, String property)
- Code that which does tree selection is present and clear. This code decides among ERP, RPE, or PER, which tree will efficiently answer a query with the given wild card pattern and selects it.
- Methods of
TreeSet
are used effectively to meet the target runtime complexity forquery()
. - Iterators are employed to avoid repeatedly searching through the tree and instead perform an in-order traversal to visit appropriate elements.
- The overall method clearly meets the target runtime complexity.
// TARGET COMPLEXITY: O(K * log N) // K: the number of matching records // N: the number of records in the triplestore. public int remove(String e, String r, String p)
- The records that must be removed are determined efficiently
potentially by using other methods of
TripleStore
. - Data is removed from all internal trees to support fast queries later.
- The overall method clearly meets the target runtime complexity.
// TARGET COMPLEXITY: O(N) // N: the number of records stored in the TripleStore public String toString()
- Make sure to use an efficient means to construct the string
representation of the
Triple Store
.
6.4 (5%) Coding Style and Readability
This is a larger project. It will require discipline and effort to keep track of how all the pieces fit together. Commenting your own code to keep track of the purpose of fields and methods will pay much higher dividends in this project than was previously the case. Your code will be inspected for clarity in the following categories.
- Code Cleanliness (1%): Indent and {bracket} code uniformly throughout the program to improve readability.
- Class Documentation (1%): Each class has an initial comment indicating its intended purpose, how it works, and how it relates or uses other classes in the project.
- Field Documentation (1%): Each field of a class is documented to indicate what piece of data is tracked and how it will be used, regardless of visibility.
- Method documentation (1%): Each method has a short description indicating its intended purpose and how it gets its job done. (These are also needed for your own helper methods you choose to add).
- Proper Use of Generics (1%): The point of using
Java generics is to get good compile-time checks and avoid runtime
casting. Effective use of generics will mean very few runtime
castes will be needed. To that end, runtime castes will be
penalized unless there is no other way around them such as in the
implementation of
ArrayList
. Unless otherwise noted, all classes required for HW do not require runtime castes.
6.5 (5%) Correct Project Setup
Correctly setting up the directory structure for each project greatly eases the task of grading lots of projects. Graders get cranky when strange directory structures or missing files are around and you do not want cranky graders. The following will be checked for setup on this project.
- The Setup instructions were followed closely
- The Project Directory is named according to the specification
- There is an Identity Text File present with the required information in it
- Code can be compiled and tests run from the command line
7 Setup and Submission
7.1 HW Directory
There is no code distribution for this assignment.
Create a directory named masonid-hwX where masonid is your mason
ID. My mason ID is ckauffm2
so I would create the directory
ckauffm2-hw4
.
This is your HW directory. Everything concerning your assignment will go in this directory.
7.2 ID.txt
Create a text file in your HW directory called ID.txt
which has
identifying information in it. My ID.txt
looks like.
Chris Kauffman ckauffm2 G001234567
It contains my full name, my mason ID, and G# in it. The presence of
a correct ID.txt
helps immensely when grading lots of assignments.
7.3 Penalties
Make sure to
- Set up your HW directory correctly
- Include an
ID.txt
- Indent your code and make comments
Failure to do so may be penalized by a 5% deduction.
7.4 Submission: Blackboard
Do not e-mail the professor or TAs your code.
Create a ZIP file of your HW directory and submit it to the course blackboard page. Do not submit multiple files manually through blackboard as this makes it hard to unpack large numbers of assignments. Learn how to create a zip and submit only that file.
On Blackboard
- Click on the Assignments section
- Click on the HW1 link
- Scroll down to "Attach a File"
- Click "Browse My Computer"
- Select you Zip file
You can resubmit to blackboard as many times as you like up to the deadline.