Last Updated: 2016-11-29 Tue 17:34

CS 310 HW 4: TripleStore Database

CHANGELOG:

Tue Nov 29 17:21:58 EST 2016
Some of the examples of query() results indicated that an ArrayList is returned. This should just be a general List which may be either an ArrayList or LinkedList.

In response to a @805 a section outlining how wild fields compare has been has been added.

Table of Contents

1 Overview and Goals

1.1 TripleStore Database

Databases pervade the computing world as they provide an efficient way to store information that can be accessed and manipulated in a flexible fashion. Most database systems provide a way to create a table which has many entries (usually written as the rows) and many columns (the different fields of each row). Databases may contain multiple different tables with varying table sizes. This assignment centers around implementing a very simple database in Java called a triplestore, so named because it has a single table with many rows but only three columns ever. For the purpose of this assignment, these columns are named Entity, Relation, and Property. Two examples of triplestores are shown in tabular format below. The first triplestore is small and somewhat trivial while the second has immediate commercial applications

1.2 TripleStore Examples

Comical Example

  entity relation property
1 Willie ISA human
2 Alf ISA alien
5 Alf EATS cat
3 Lynn ISA human
4 Lucky ISA cat
6 Lynn EATS veggies
7 Lucky LIKESTO purr
8 Lucky EATS catfood
9 Alf EATS veggies

Commercial Example

  entity relation property
1 1 ISA boots
2 1 AVAILABLE 4
3 1 COSTS 32.50
4 1 SIZE 10
5 2 ISA boots
6 2 AVAILABLE 6
7 2 COSTS 32.50
8 2 SIZE 9
9 3 ISA shirt
10 3 COLOR red
11 3 COSTS 11.99
12 3 SIZE XL
13 3 AVAILABLE 2
14 3 MATERIAL cotton
15 3 DESIGNER Swankums
16 4 ISA watch
17 4 ISA accessory
18 4 AVAILABLE 1
19 4 COSTS 302.29
20 4 DESIGNER PricePlusPlus
21 4 MATERIAL gold
22 5 ISA hat
23 5 ISA accessory
24 5 MATERIAL wool
25 5 COSTS $29.99
26 5 COLOR gray
27 5 AVAILABLE 0
28 5 ISA warm clothing
29 5 DESIGNER Swankums

1.3 TripleStore Demo

The quickest way to get an immediate sense of how a TripleStore looks and feels is to examine the following demo done in DrJava's interaction pane. jGrasp provides the same functionality for experimentation.

In the demo, most results of assignments or object returns are shown on the following line by NOT putting a semicolon at the end of the line. This is the default behavior in DrJava while in jGrasp you must type a variable name on its own line to see its printed representation. Comments are also inserted to add some guidance to the demo.

Welcome to DrJava.
> TripleStore t = new TripleStore();
> t

> // Empty TripleStore

> // --- ADD --- 
> boolean b = t.add("Willie","ISA","human");
> b
true
> // Successful add returns true

> t.add("Alf","ISA","alien");
> t
     Alf      ISA      alien
  Willie      ISA      human
> // Two Records in the TripleStore

> b = t.add("Willie","ISA","human");
> b
false
> // Duplicates are not allowed
> t
     Alf      ISA      alien
  Willie      ISA      human
> // Still only two Records in the TripleStore

> t.add("Lynn","ISA","human");
> t.add("Lucky","ISA","cat");
> t.add("Alf","EATS","cat");
> t.add("Lynn","EATS","veggies");
> t.add("Lucky","LIKESTO","purr");
> t.add("Lucky","EATS","catfood");
p> t.add("Alf","EATS","veggies");
> t
     Alf     EATS        cat
     Alf     EATS    veggies
     Alf      ISA      alien
   Lucky     EATS    catfood
   Lucky      ISA        cat
   Lucky  LIKESTO       purr
    Lynn     EATS    veggies
    Lynn      ISA      human
  Willie      ISA      human
> // Variety of things in the TripleStore

> // --- QUERY ---
> import java.util.*;
> List<Record> results;
> results = t.query("Alf","ISA","alien");
> results
[     Alf      ISA      alien]
> // query returns an List
> // Records match exactly, 1 result

> t.getWild()
"*"
> results = t.query("Alf","ISA","*")
[     Alf      ISA      alien]
> // query with a wild card matched 1 record
> results = t.query("Alf","EATS","*")
[     Alf     EATS        cat,      Alf     EATS    veggies]
> // query with a wild card matched 2 records
> results = t.query("Alf","*","*")
[     Alf     EATS        cat,      Alf     EATS    veggies,      Alf      ISA      alien]
> // query with several wild cards matched 3 records

> t.query("*","ISA","human")
[    Lynn      ISA      human,   Willie      ISA      human]
> t.query("*","*","human")
[    Lynn      ISA      human,   Willie      ISA      human]
> t.query("*","*","cat")
[     Alf     EATS        cat,    Lucky      ISA        cat]
> t.query("*","ISA","*")
[     Alf      ISA      alien,    Lucky      ISA        cat,     Lynn      ISA      human,   Willie      ISA      human]
> // wildcards can appear in any of entity,relation,property

> // --- REMOVE ---
> int nrm = t.remove("Alf","EATS","veggies")
1
> // successful removal, changes database
> t
     Alf     EATS        cat
     Alf      ISA      alien
   Lucky     EATS    catfood
   Lucky      ISA        cat
   Lucky  LIKESTO       purr
    Lynn     EATS    veggies
    Lynn      ISA      human
  Willie      ISA      human

> nrm = t.remove("Alf","EATS","fruit")
0
> // unsuccessful remove
> nrm = t.remove("Alf","EATS","veggies")
0
> // record no longer exists

> nrm = t.remove("Alf","*","*")
2
> t
   Lucky     EATS    catfood
   Lucky      ISA        cat
   Lucky  LIKESTO       purr
    Lynn     EATS    veggies
    Lynn      ISA      human
  Willie      ISA      human

> t.remove("*","*","*")
6
> t

> // wild cards can remove many records

> --- WILDCARDS ---
> t.getWild()
"*"
> // Can add wild card strings as records
> t.add("Alf","ISA","*")
> t
Alf      ISA      *        
> t.add("Alf","ISA","whateva")
true
> t.add("Alf","ISA","alien")
true
> t
Alf      ISA      *        
Alf      ISA      alien    
Alf      ISA      whateva  
> t.query("Alf","ISA","*")
[Alf      ISA      *        , Alf      ISA      alien    , Alf      ISA      whateva  ]

> t.setWild("whateva")
> // The * is no longer the wild card, matches only equal strings
> t.query("Alf","ISA","*")
[Alf      ISA      *        ]

> // String "whateva" is not the wild card, matches anything
> t.query("Alf","ISA","whateva")
[Alf      ISA      *        , Alf      ISA      alien    , Alf      ISA      whateva  ]
> t.setWild("whateva")
> t.getWild()
"whateva"
> t.query("Alf","ISA",t.getWild())
[Alf      ISA      *        , Alf      ISA      alien    , Alf      ISA      whateva  ]

1.4 Basic operations

A TripleStore is like many other data structures in that it provides add, remove, and find functionality. Similar to the binary search trees discussed in class, no duplication is allowed in a TripleStore so add() will not add a record that is already present. Slightly more general than other data structures is that the find functionality, named query(), may return multiple results in a collection and the remove functionality may remove more than one record from the triplestore. The detailed functionality of each method is described in the implementation sections in the section on TripleStore.

1.5 Records

The records stored in the TripleStore will be objects of class Record. At its core, a Record is just a way cart around the String objects entity,property,relation in a triplet as shown in the tables above. Records are immutable in that, once they are created, they cannot change, in much the same way that when a String is created, it cannot be changed. This sacrifices some space and time efficiency for much simpler reasoning about data structures involving Record. The fields of a Record are accessible by methods named after the fields.

public String entity();         // Who
public String relation();       // How
public String property();       // What

In addition to these, every Record will have a unique identifier which is returned by the id() method. Whenever a Record is created, a new unique ID number is assigned to it which is accessible through the id() method.

public int id();		// Must be unique

1.6 Fast Access and Wildcards

Each triplestore keeps track of a wild card string which allows flexible queries and removals that can handle multiple matching records. The default wild card is * (star) but it may be changed to any string using methods described below. A query involving a wildcard may match more than one record . Providing fast location of all Records that match a query involving wildcards is the primary focus of this assignment.

There are several options for arranging Records in memory, but providing a reasonably efficient combination of add,query,remove operations, \(O(\log N)\) suggests the use of balanced binary search trees.

In addition to this, query() calls which use wild cards may return multiple records. The sorted nature of binary search trees will come in handy here. For example, consider a binary tree which sorts each stored Record starting with the String entity field breaking ties by examining relation and breaking further ties with property. An example is shown below which does this in jGrasp and uses an pre-built red-black tree in the java library. Note the color of nodes is shown accurately as red or black.

erp-rbtree.png

Figure 1: Red-Black Tree with Entity-Relation-Property Sorting

If t.query("Alf","*","*") is executed, an effective strategy is to find the "least" node involving Alf and begin an in-order traversal from that point. This would lead to the sequence of Records

Alf EATS cat; Alf EATS veggies; Alf ISA alien; Lucky EATS catfood; ...

which contains relevant records to the query. During the traversal, if each record is checked to determine if it matches the query, the first three clearly match while the fourth involving Lucky does not match. Since the tree is sorted, there can be no more Alf records and the method can stop the traversal and return the three records found that match the query.

This process involves the following steps.

  1. Find the smallest matching record
  2. Start an in-order traversal at that record
  3. For each record in the traversal
    • If the record matches the query, add it to the growing collection
    • Otherwise there can be no more matching records so stop and return

Step 1 can be done in \(O(\log N)\) time as one simply needs to search our tree for the "spot" where the query would belong. Step 2 is \(O(1)\) if the tree has parent pointers or some facility for supporting in-order traversals of the tree efficiently (which is the case for production quality tree implementations). Finally, Step 3 visits each matching node plus one additional node that indicates there will be no other matches. If there are \(K\) matches, Step 3 takes \(O(K)\) time. Thus, the total complexity is \(O(K + \log N)\).

Unfortunately, the above tree will not help us with a query such as t.query("*","ISA","*") as the tree is not sorted appropriately: the matching ISA records are spread throughout the tree. A simple solution to this is to use additional trees storing the same data but sorted differently. For example, the below tree stores the same data but sorted on relation, then property, then entity.

rpe-rbtree.png

Figure 2: Red-Black Tree with Relation-Property-Entity Sorting

The records matching t.query("*","ISA","*") are all stored linked together starting with the root.

A data structure that facilitates fast lookup into a database is usually referred to as an index. When a user calls t.query() it is up to the TripleStore class to analyze the query to determine the most efficient means to find matching records, usually by selecting an index if one is available. All queries to triplestore should be efficient: regardless of the query query() should meet the target \(O(K + \log N)\) complexity target. This will require several indexes.

The trade-off of this fast lookup is that when records are added or removed, all indexes must be updated. The cost of add will be \(O(\log N)\) as it deals with only a single record. The cost of remove is \(O(K \times \log N)\) as multiple records may need to be removed. There is also a space trade-off as the stored records require more data structures to facilitate fast lookup which means more memory is used.

2 Class Architecture

2.1 Project Files

The following files are relevant to the HW. When submitting, place them in your HW Directory, zip that directory and submit. Always submit a zip file not tar.gz, not bzip, not 7zip, just vanilla zip files. For additional instructions see Setup and Submission.

File State Notes
Record.java Create Triples stored in the TripleStore database. May subclass to support queries.
TripleStore.java Create Database with 3 columns. Stores records via add() and allows find() and remove().
     
junit-c310.jar Testing JUnit library for testing. Copy over from previous projects.
ID.txt Create Identifying information

2.2 Built-in Classes of Interest

It is strongly recommended that you not try to implement balanced trees yourself. Aside from the obvious difficulty, this is not an analogous activity to what programming in the wild is like. Instead, consider the use of the following classes

java.util.TreeSet
Implemented using a red-black tree under the hood, this is an incredibly useful class to become familiar with. While TreeSet is fairly complex, it provides nearly all the functionality required for this assignment without even the need to extend it. Pay particular attention to methods which provide a view of the tree as a SortedSet such as tailSet() which essentially give a "view" of a subset of the tree; an iterator can be obtained from this subset which will traverse the tree in order. The standard add and remove methods are also present for the TreeSet.
java.util.Comparator
An interface that allows one to create objects which compare other objects. Any binary search tree will need a means of comparing objects and if the same objects are to be sorted in multiple ways, Comparators are the standard way to do this. Note that TreeSet has a constructor which takes a Comparator as an argument: any insertions or lookups use the given comparator as a way to navigate the sorted tree.

Mastering these two classes will involve reading their documentation carefully and experimenting in interactive loops or with your own compiled code. This is typically the case when using other people's code and is what developers in the wild do far more than actually writing their own code. It is worth the effort to make the project easier and to develop reading/understanding skills so that you don't re-invent the tree.

Note: You should not need to examine the the source code for TreeSet to complete the assignment but the curious and ambitious wizard may find it enlightening.

2.3 Constraints

The only code you may use for this project falls into two categories

  1. Classes in the Java standard library such as TreeSet
  2. Classes and code you write yourself

There may be some triplestore java implementations out there but you are not to use them for this project; such action would constitute an honor code violation.

2.4 Record.java

// Immutable.  Stores 3 strings referred to as entity, relation, and
// property. Each Record has a unique integer ID which is set on
// creation.  All records are made through the factory method
// Record.makeRecord(e,r,p).  Record which have some fields wild are
// created using Record.makeQuery(wild,e,r,p)
public class Record{

  // Return the next ID that will be assigned to a Record on a call to
  // makeRecord() or makeQuery()
  public static int nextId();

  // Return a stringy representation of the record. Each string should
  // be RIGHT justified in a field of 8 characters with whitespace
  // padding the left.  Java's String.format() is useful for padding
  // on the left.
  public String toString();

  // Return true if this Record matches the parameter record r and
  // false otherwise. Two records match if all their fields match.
  // Two fields match if the fields are identical or at least one of
  // the fields is wild.
  public boolean matches(Record r);

  // Return this record's ID
  public int id() ; 

  // Accessor methods to access the 3 main fields of the record:
  // entity, relation, and property.
  public String entity(); 

  public String relation(); 

  public String property(); 

  // Returns true/false based on whether the the three fields are
  // fixed or wild.
  public boolean entityWild(); 

  public boolean relationWild(); 

  public boolean propertyWild(); 

  // Factory method to create a Record. No public constructor is
  // required.
  public static Record makeRecord(String entity, String relation, String property);

  // Create a record that has some fields wild. Any field that is
  // equal to the first argument wild will be a wild card
  public static Record makeQuery(String wild, String entity, String relation, String property);

  // Comparators that compare Records based on different orderings of
  // their fields. The names of the Comparators correspond to the
  // order in which they compare fields: ERPCompare compares Entity
  // (E), then Relation (R), then property (P). Likewise for
  // RPECompare and PER compare.
  public static final Comparator<Record> ERPCompare;

  public static final Comparator<Record> RPECompare;

  public static final Comparator<Record> PERCompare;

}

2.5 TripleStore.java

// Three-column database that supports query, add, and remove in
// logarithmic time.
public class TripleStore{

  // Create an empty TripleStore. Initializes storage trees
  public TripleStore();

  // Access the current wild card string for this TripleStore which
  // may be used to match multiple records during a query() or
  // remove() calll
  public String getWild();

  // Set the current wild card string for this TripleStore
  public void setWild(String w);

  // Ensure that a record is present in the TripleStore by adding it
  // if necessary.  Returns true if the addition is made, false if the
  // Record was not added because it was a duplicate of an existing
  // entry. A Record with any fields may be added to the TripleStore
  // including a Record with fields that are equal to the
  // TripleStore's current wild card. Throws an
  // IllegalArgumentException if any argument is null.
  // 
  // Target Complexity: O(log N)
  // N: number of records in the TripleStore
  public boolean add(String entity, String relation, String property);

  // Return a List of the Records that match the given query. If no
  // Records match, the returned list should be empty. If a String
  // matching the TripleStore's current wild card is used for one of
  // the fields of the query, multiple Records may be returned in the
  // match.  An appropriate tree must be selected and searched
  // correctly in order to meet the target complexity. Throws an
  // IllegalArgumentException if any argument is null.
  // 
  // TARGET COMPLEXITY: O(K + log N) 
  // K: the number of matching records 
  // N: the number of records in the triplestore.
  public List<Record> query(String entity, String relation, String property);

  // Remove elements from the TripleStore that match the parameter
  // query. If no Records match, no Records are removed.  Any of the
  // fields given may be the TripleStore's current wild card which may
  // lead to multiple Records bein matched and removed. Return the
  // number of records that are removed from the TripleStore. Throws
  // an IllegalArgumentException if any argument is null.
  // 
  // TARGET COMPLEXITY: O(K * log N)
  // K: the number of matching records 
  // N: the number of records in the triplestore.
  public int remove(String e, String r, String p);

  // Produce a String representation of the TripleStore. Each Record
  // is formatted with its toString() method on its own line. Records
  // must be shown sorted by Entity, Relation, Property in the
  // returned String. 
  // 
  // TARGET COMPLEXITY: O(N)
  // N: the number of records stored in the TripleStore
  public String toString();

}

3 Record Class Implementation

public class Record

Databases contain "rows" and in the case of TripleStore, each row comprises three fields which will be housed in instances of the Record class. It is an immutable class which carts around three strings. The class provides facilities for unique creation and creation of records which act as queries that can match other records. It also provides a set of comparators for the arrangement of Records in various ways in the TripleStore.

3.1 Creation and Basic Functionality

Fields and accessors

public int id();              // Must be unique
public String entity();	      // Who
public String relation();     // How
public String property();     // What

Factory method

public static Record makeRecord(String e, String r, String p)

Notes

  • You are free to specify your own constructors but testing code always uses the factory method Record.makeRecord(..) which should return a record.
  • If any of the arguments e,r,p are null, throw an IllegalArgumentException with an informative message.
  • The ID number of a Record is accessible only through the id() method and never changes after creation.
  • The id() method must return an integer unique to every record but is not set by the user. A class level (static) private field is usually used for this kind of behavior. Example:
    > Record r1 = Record.makeRecord("Alf","ISA","alien");
    > Record r2 = Record.makeRecord("Alf","ISA","alien");
    > r1.id()
    45
    > r2.id()
    46
    > r1.id() == r2.id()
    false
    

3.2 Record.toString()

Note: the demos may not display records in exactly the right format. This section contains the expected format.

public String toString()

A specific format is required for Record.toString().

  • Each field is separated by 1 space.
  • entity appears first and is right justified in a field of 8 characters
  • relation appears second and is right justified in a field of 8 characters
  • property appears third and is right justified in a field of 8 characters
  • Use of String.format("%8s ") or something similar is encouraged to create the formatted record string
  • If any field exceeds 8 characters, take no special action acknowledging that this may cause tables of records to display undesirably.
  • id does not apper in toString()
  • No special action needs to be taken if the fields of the record are wider than 8 characters: this may lead to ugly printing but fixing this problem is beyond the scope of project.

You will likely find the function String.format() useful for constructing building string representations.

Several records formatted as strings are below.

> Record r;
> r = Record.makeRecord("A","B","C");
> r
       A        B        C 
> r.toString()
"       A        B        C "
> r = Record.makeRecord("Alf","ISA","alien");
> r
     Alf      ISA    alien 
> r.toString()
"     Alf      ISA    alien "
> r = Record.makeRecord("12345678","12345678","12345678");
> r
12345678 12345678 12345678 
> r.toString()
"12345678 12345678 12345678 "

// Take no special action when the fields are longer than 8 characters
> r = Record.makeRecord("1234567890","123456789","123456789012");
> r
1234567890 123456789 123456789012 
> r.toString()
"1234567890 123456789 123456789012 "

3.3 Records with Wild Cards

public static Record makeQuery(String wild, String e, String r, String p)

This factory method returns a special kind of record, perhaps a subclass of Record which is able to match other records. If any of e,r,p equal the first argument wild, those fields should be marked as "wild" and can match any other string in Record.matches().

public boolean entityWild()
public boolean relationWild()
public boolean propertyWild()

Each record has several simple accessors which determine if its fields are wild. A field is wild only if it exactly matches the wild card in Record according to the String.equals() method.

// Records created with makeRecord(..) never have wild fields
> Record notAquery = Record.makeRecord("Alf","*","*");
> notAquery.entityWild()
false
> notAquery.relationWild()
false
> notAquery.propertyWild()
false

// Queries created with makeQuery(..) have wild fields where they
// match the first argument string
> Record query = Record.makeQuery("*","Alf","*","*");
> query
     Alf        *        * 
> query.entityWild()
false
> query.relationWild()
true
> query.propertyWild()
true

// Any string can denote wild records; 
> Record query2 = Record.makeQuery("wild","X","wild","*");
> query2
       X     wild        * 
> query2.entityWild()
false
> query2.relationWild()
true
> query2.propertyWild()  
false
// despite being *, third field is not wild as the word 'wild' was
// chosen to denote wild fields in construction of query2

3.4 Record.matches()

public boolean matches(Record r)
  • Determine if two Records match; return true if they do and false otherwise.
  • Matching occurs when all the fields entity,relation,property match
  • Fields match if either one or both is wild or both fields are exactly the same according to String.equals()
  • Examples
> Record r1 = Record.makeRecord("Alf","ISA","alien");
> Record r2 = Record.makeRecord("Alf","ISA","alien");
> r1.matches(r2)
true
> Record r3 = Record.makeRecord("Alf","EATS","cat");
> r1.matches(r3)
false
> Record r4 = Record.makeQuery("*","Alf","ISA","*")
     Alf      ISA          *
> r1.matches(r4)
true
> 
> Record r5 = Record.makeQuery("*","Alf","*","*");
> r1.matches(r5)
true
> r4
     Alf      ISA          *
> r5
     Alf        *          *
> r4.matches(r5)
true

3.5 Record Comparators

public static final Comparator<Record> ERPCompare
public static final Comparator<Record> RPECompare
public static final Comparator<Record> PERCompare

The purpose of these three objects is to allow Records to be arranged in different ways in a binary search tree. Each implement the Comparator<Record> interface so that they have the following method.

public int compare(Record r1, Record r2)

Each of the comparators compares fields of Records in a different order to determine which Record is sorted first. The orders are

  • ERPCompare: entity relation property
  • RPECompare: relation property entity
  • PERCompare: property entity relation

Comparison starts with the first field listed. If they are equal, the next field listed is compared to break the tie, and if they are equal the final field is considered. The records are equal only if all of entity,relation,property match exactly.

For example given the records

Record abc = Record.makeRecord("a","b","c");
Record bca = Record.makeRecord("b","c","a");
Record cba = Record.makeRecord("c","b","a");
Record caa = Record.makeRecord("c","a","a");

The following results apply

Comparison entity relation property ERPCompare RPECompare PERCompare
compare(abc,bca) abc < bca abc < bca abc > bca < 0 < 0 > 0
compare(abc,cba) abc < cba abc = cba abc > cba < 0 > 0 > 0
compare(cba,caa) cba = caa cba > caa cba = caa > 0 > 0 > 0

It is very important that these comparators work correctly as they determine how trees which use them will be sorted. Make sure to test thoroughly.

Some additional examples are given below.

> Record abc = Record.makeRecord("a","b","c");
> Record abb = Record.makeRecord("a","b","b");
> Record bca = Record.makeRecord("b","c","a");
> Record cba = Record.makeRecord("c","b","a");
> Record.ERPCompare.compare(abc,abb)
1
> Record.ERPCompare.compare(abc,abc)
0
> Record.ERPCompare.compare(abc,abb)
1
> Record.RPECompare.compare(abc,abc)
0
> Record.RPECompare.compare(abc,abb)
1
> Record.RPECompare.compare(abc,bca)
-1
> Record.PERCompare.compare(abc,abc)
0
> Record.PERCompare.compare(abc,abb)
1
> Record.PERCompare.compare(abc,bca)
2
> Record.ERPCompare.compare(abc,cba)
-2
// +2 and -2 are not errors.
// Results need not be restricted to -1,0,+1 
// so long as they abide by <0, ==0, >0
// for the proper cases.  String.compareTo
// may be directly used in the comparators

3.6 Wildcards and Comparisons

When implementing query, one typically wants to find the "first" record that matches a query with a wildcard. This can be done by iterating through the tree from the beginning but such an approach would not meet the target complexity.

Instead, production trees like TreeSet provide facilities to obtain sorted views of the tree starting at different elements, for instance starting at the first stored element bigger than a given element. One can use this facility along with properly implemented comparators to accomplish the fast query for this project.

If a comparator treats wildcards as less than any other string then asking a properly sorted tree for the first entry bigger than the query will yield the first matching Record if one exists. One can then start an in-order traversal to find any other matching Records for the query results. This means the comparators need to be aware of the wildcard in Record and use it during comparisons.

Consider the tree sorted on entity,relation,property and the call t.query("Lucky","*","*").

erp-rbtree.png

Figure 3: Red-Black Tree with Entity-Relation-Property Sorting

If the wildcards in ("Lucky","*","*") are considered less than the strings in the relation and property fields of the existing records involving Lucky are

Lucky EATS catfood
Lucky ISA cat
Lucky LIKESTO purr

then the first record "bigger" than the query according to the tree diagram will be Lucky EATS catfood and an in-order traversal from there will find all Lucky records. The query Lucky * * would fit "in between" Alf ISA alien and Lucky EATS catfood in the tree above if it were allowed to be added (which it is not).

The following examples illustrate the intention for wildcards in comparisons.

> Record r1 = Record.makeRecord("Lucky","ISA","cat")
> r1
   Lucky      ISA        cat
> Record r2 = Record.makeRecord("Lucky","LIKESTO","purr")
r2
   Lucky  LIKESTO       purr
> Record q = Record.makeQuery("*","Lucky","*","*")
> q
   Lucky        *          *
> Record.ERPCompare.compare(r1, r2)
-3
> Record.ERPCompare.compare(q, r1)
-1
> Record.ERPCompare.compare(q, r2)
-1
> Record.PERCompare.compare(r1, r2)
-13
> Record.PERCompare.compare(q, r1)
-1
> Record.PERCompare.compare(q, r2)
-1
> Record r3 = Record.makeRecord("Alf","ISA","alien")
> r3
     Alf      ISA      alien
> Record.ERPCompare.compare(q, r3)
11
> Record r4 = Record.makeRecord("Lucky","EATS","catfood")
> r4
   Lucky     EATS    catfood
> Record.ERPCompare.compare(q, r4)
-1

Though the situation should never arise in its use of with Triplestore, two wild fields are considered equal to one another regardless of what their underlying string might be.

// All wild fields
> Record q = Record.makeQuery("*","*","*","*")
> Record v = Record.makeQuery("*","*","*","*")
> Record.ERPCompare.compare(q,v)
0

// Same wilds except for cat
> v = Record.makeQuery("*","*","*","cat")
> Record.ERPCompare.compare(q,v)
-1
> Record.ERPCompare.compare(v,q)
1

// Two records with wild fields but different wild strings are
// equal to one another
> q = Record.makeQuery("*","*","*","*")
> v = Record.makeQuery("wild","wild","wild","wild")
> Record.ERPCompare.compare(v,q)
0

> q = Record.makeQuery("*","*","*","cat")
> v = Record.makeQuery("wild","wild","wild","cat")
> Record.ERPCompare.compare(v,q)
0

Note: The Comparator interface specifies in addition to the compare() method that Comparators should have an equals(Object o) method. All classes inherit equals() from Object and for this project, there is no need to over-ride the default equals() implementation, only that an implementation of compare(r1,r2) is provided.

4 TripleStore Class Implementation

public class TripleStore

Triplestores are databases that have three columns (each row has three fields). Our implementation will enforce that each record in the database is unique. The operations supported are insertion, lookup/query, and removal based on a query. This implementation will support logarithmic time operations for all three operations. To achieve this performance, the database must store its Records in three separate balanced binary search trees. A good choice for the BST is java.util.TreeSet which is implemented as a Red-Black tree. TreeSet provides a variety of useful methods that you should review via its javadocs.

4.1 TripleStore Constructor and Basic methods

public TripleStore()

Create a TripleStore. Initialize any trees you will use in the constructor. Make sure to pass in relevant arguments to the tree constructors such as Comparators.

In addition, specify a few simple methods to get and set the wild card for a triplestore.

public String getWild()
public void setWild(String w)
  • These methods allow the current wild card to be inspected and changed.
  • The default wild card is required to be the asterisk string *
  • The following example illustrates the independence of wild cards between separate triplestore instances.
TripleStore t1 = new TripleStore();
t1.add("Willie","ISA","human");

TripleStore t2 = new TripleStore();
t2.add("Willie","ISA","human");
t2.setWild("@");

t1.query("Willie","*","*");
// [   Willie      ISA      human]

t2.query("Willie","@","@");
// [   Willie      ISA      human]

t1.query("Willie","@","@");
// []

t2.query("Willie","*","*");
// []

4.2 TripleStore.toString()

  // TARGET COMPLEXITY: O(N)
  // N: the number of records stored in the TripleStore

Create a string representation of the TripleStore suitable for printing. This should be done by building a String containing the results of each Record.toString() which is currently stored and separating these with a newline. Proper implementation of Record.toString() is essential for TripleStore.toString().

Notes

  • Records MUST appear in sorted order based on a correct implementation of ERPComparator. See Record Comparators for details on this order. Warning: Some of the demo displays may not show records exactly in this ordering as the demos were generated using a prototype but it is requirement for your implementation.
  • Conversion to a string should not repeatedly call query but instead directly walk through the underlying data likely via a traversal of one of the index trees.
  • To meet the given complexity bound, you will need to find an efficient way to concatenate Strings. The typical use of + is NOT efficient and will result in a \(O(N^2)\) performance. Your textbook contains information on efficiently building Strings up.

4.3 TripleStore.add()

  // Target Complexity: O(log N)
  // N: number of records in the TripleStore

Add a single record into the TripleStore.

  • Use the three argument Strings to create a valid Record
  • It is not an error to add a Record with the current wild card
  • If the TripleStore already contains an identical record, do not add the new duplicate information and return false
  • Add the Record and ensure that any and all index trees are updated.

Examples from 1.3

> TripleStore t = new TripleStore();
> t

> // Empty TripleStore

> // --- ADD --- 
> boolean b = t.add("Willie","ISA","human");
> b
true
> // Successful add returns true

> t.add("Alf","ISA","alien");
> t
     Alf      ISA      alien
  Willie      ISA      human
> // Two Records in the TripleStore

> b = t.add("Willie","ISA","human");
> b
false
> // Duplicates are not allowed
> t
     Alf      ISA      alien
  Willie      ISA      human
> // Still only two Records in the TripleStore

> t.add("Lynn","ISA","human");
> t.add("Lucky","ISA","cat");
> t.add("Alf","EATS","cat");
> t.add("Lynn","EATS","veggies");
> t.add("Lucky","LIKESTO","purr");
> t.add("Lucky","EATS","catfood");
> t.add("Alf","EATS","veggies");
> t
     Alf     EATS        cat
     Alf     EATS    veggies
     Alf      ISA      alien
   Lucky     EATS    catfood
   Lucky      ISA        cat
   Lucky  LIKESTO       purr
    Lynn     EATS    veggies
    Lynn      ISA      human
  Willie      ISA      human

4.4 TripleStore.query()

  // TARGET COMPLEXITY: O(K + log N) 
  // K: the number of matching records 
  // N: the number of records in the triplestore.
  public List<Record> query(String entity, String relation, String property)

Return a List of the records that match the given query parameters. Any of the arguments entity,relation,property may be the wild card for the Triplestore.

  • To meet the complexity bound, you will need to identify which index to use for the initial search into the tree.
  • Use methods of your tree (probably TreeSet) to get a view of the tree starting at a certain key and which allow the tree to be traversed in order. This will allow you to meet the target complexity.
  • You'll likely need to use Record.makeQuery(..) to create a record with the wild fields matching the wild card associated with the triplestore.

Examples from TripleStore Demo

> // --- QUERY ---
> import java.util.*;
> List<Record> results;
> results = t.query("Alf","ISA","alien");
> results
[     Alf      ISA      alien]
> // query returns an List
> // Records match exactly, 1 result

> t.getWild()
"*"
> results = t.query("Alf","ISA","*")
[     Alf      ISA      alien]
> // query with a wild card matched 1 record
> results = t.query("Alf","EATS","*")
[     Alf     EATS        cat,      Alf     EATS    veggies]
> // query with a wild card matched 2 records
> results = t.query("Alf","*","*")
[     Alf     EATS        cat,      Alf     EATS    veggies,      Alf      ISA      alien]
> // query with several wild cards matched 3 records

> t.query("*","ISA","human")
[    Lynn      ISA      human,   Willie      ISA      human]
> t.query("*","*","human")
[    Lynn      ISA      human,   Willie      ISA      human]
> t.query("*","*","cat")
[     Alf     EATS        cat,    Lucky      ISA        cat]
> t.query("*","ISA","*")
[     Alf      ISA      alien,    Lucky      ISA        cat,     Lynn      ISA      human,   Willie      ISA      human]
> // wildcards can appear in any of entity,relation,property

4.5 TripleStore.remove()

  // TARGET COMPLEXITY: O(K * log N)
  // K: the number of matching records 
  // N: the number of records in the triplestore.
  public int remove(String e, String r, String p){

Note: remove() does not need to be as fast as query(): compare their target complexities carefully

  • query(): \(O(K + \log{N})\)
  • remove(): \(O(K \times \log{N})\)

Remove records from the TripleStore. Any of the arguments entity,relation,property may be wild. All records that match the parameters will be removed.

Return the number of elements removed which may be 0 if no records matched the parameters.

Examples from TripleStore Demo

> // --- REMOVE ---
> t
     Alf     EATS        cat
     Alf     EATS    veggies
     Alf      ISA      alien
   Lucky     EATS    catfood
   Lucky      ISA        cat
   Lucky  LIKESTO       purr
    Lynn     EATS    veggies
    Lynn      ISA      human
  Willie      ISA      human

> int nrm = t.remove("Alf","EATS","veggies")
1
> // successful removal, changes database
> t
     Alf     EATS        cat
     Alf      ISA      alien
   Lucky     EATS    catfood
   Lucky      ISA        cat
   Lucky  LIKESTO       purr
    Lynn     EATS    veggies
    Lynn      ISA      human
  Willie      ISA      human

> nrm = t.remove("Alf","EATS","fruit")
0
> // unsuccessful remove
> nrm = t.remove("Alf","EATS","veggies")
0
> // record no longer exists

> nrm = t.remove("Alf","*","*")
2
> t
   Lucky     EATS    catfood
   Lucky      ISA        cat
   Lucky  LIKESTO       purr
    Lynn     EATS    veggies
    Lynn      ISA      human
  Willie      ISA      human

> t.remove("*","*","*")
6
> t

> // wild cards can remove many records

5 Grading

Grading for this HW will be divided into three distinct parts:

  • Part of your grade will be based on passing some automated test cases by an early "milestone" deadline. See the top of the HW specification for
  • Part of your grade will be based on passing all automated test cases by the final deadline
  • Part of your grad will be based on a manual inspection of your code and analysis documents by the teaching staff to determine quality and efficiency.

5.1 Final Automated Tests (50%)

  • JUnit test cases will be provided to detect errors in your code. These will be run by a grader on submitted HW after the final deadline.
  • Tests may not be available on initial release of the HW but will be posted at a later time.
  • Tests may be expanded as the HW deadline approaches.
  • It is your responsibility to get and use the freshest set of tests available.
  • Tests will be provided in source form so that you will know what tests are doing and where you are failing.
  • It is up to you to run the tests to determine whether you are passing or not. If your code fails to compile against the tests, little credit will be garnered for this section
  • Most of the credit will be divide evenly among the tests; e.g. 50% / 25 tests = 2% per test. However, the teaching staff reserves the right to adjust the weight of test cases after the fact if deemed necessary.
  • Test cases are typically run from the command line using the following invocation which you should verify works as expected on your own code.
    UNIX Command line instructions
      Compile
    > javac -cp .:junit-cs310.jar *.java
    
    Run tests
    > java -cp .:junit-cs310.jar SomeTests
    
    WINDOWS Command line instructions: replace colon with semicolon
    Compile
    > javac -cp .;junit-cs310.jar *.java
    
    Run tests
    > java -cp .;junit-cs310.jar SomeTests
    

5.2 Final Manual Inspection (50%)

  • Graders will manually inspect your code and analysis documents looking for a specific set of features after the final deadline.
  • Most of the time the requirements for credit will be posted along with the assignment though these may be revised as the the HW deadline approaches.
  • Credit will be awarded for good coding style which includes
    • Good indentation and curly brace placement
    • Comments describing private internal fields
    • Comments describing a complex section of code and invariants which must be maintained for classes
    • Use of internal private methods to decompose the problem beyond what is required in the spec
  • Some credit will be awarded for clearly adhering to the target complexity bounds specified in certain methods. If the specification lists the target complexity of a method as O(N) but your implementation is actually O(N log N), credit will be deducted. If the implementation complexity is too difficult to determine due to poor coding style, credit will likely be deducted.

    All TARGET COMPLEXITIES are worst-case run-times.

  • Some credit will be awarded for turning in any analysis documents that are required by the HW specification. These typically involve analyzing how fast a method should run or how much memory a method requires and are reported in a text document submitted with your code.

6 Final Manual Inspection Criteria

6.1 (10%) Record Design

  • The design of the Record class is documented via comments on class fields.
  • It is clear how the notion of wild cards is supported.
  • It is clear that any string can be used to create a query with wild fields and that there are no external dependencies of wildness on other classes such as TripleStore.
  • There is adequate documentation of the matches(..) method so that it is easy to understand how it operates.

6.2 (10%) Record Comparator Design and Use

  • The static fields housing the comparators are present and usable
  • It is clear how each comparator implements compare(..) differently to sort Records in alternative orders. Some comments and clear code are present achieve the different sorting orders.
  • When creating trees that store Records in different orders, TripleStore adheres to the public interface of Record laid out in the HW specification. TripleStore should not use any classes that are internal to Record except via public fields mentioned in the HW specification.

6.3 (20%) TripleStore Method Runtime Complexities

  // Target Complexity: O(log N)
  // N: number of records in the TripleStore
  public boolean add(String entity, String relation, String property)
  • It is clear that no duplication of records occurs during add.
  • Data is added to all internal trees to support fast queries later.
  • The overall method clearly meets the target runtime complexity.
  // TARGET COMPLEXITY: O(K + log N) 
  // K: the number of matching records 
  // N: the number of records in the triplestore.
  public List<Record> query(String entity, String relation, String property)
  • Code that which does tree selection is present and clear. This code decides among ERP, RPE, or PER, which tree will efficiently answer a query with the given wild card pattern and selects it.
  • Methods of TreeSet are used effectively to meet the target runtime complexity for query().
  • Iterators are employed to avoid repeatedly searching through the tree and instead perform an in-order traversal to visit appropriate elements.
  • The overall method clearly meets the target runtime complexity.
  // TARGET COMPLEXITY: O(K * log N)
  // K: the number of matching records 
  // N: the number of records in the triplestore.
  public int remove(String e, String r, String p)
  • The records that must be removed are determined efficiently potentially by using other methods of TripleStore.
  • Data is removed from all internal trees to support fast queries later.
  • The overall method clearly meets the target runtime complexity.
  // TARGET COMPLEXITY: O(N)
  // N: the number of records stored in the TripleStore
  public String toString()
  • Make sure to use an efficient means to construct the string representation of the Triple Store.

6.4 (5%) Coding Style and Readability

This is a larger project. It will require discipline and effort to keep track of how all the pieces fit together. Commenting your own code to keep track of the purpose of fields and methods will pay much higher dividends in this project than was previously the case. Your code will be inspected for clarity in the following categories.

  • Code Cleanliness (1%): Indent and {bracket} code uniformly throughout the program to improve readability.
  • Class Documentation (1%): Each class has an initial comment indicating its intended purpose, how it works, and how it relates or uses other classes in the project.
  • Field Documentation (1%): Each field of a class is documented to indicate what piece of data is tracked and how it will be used, regardless of visibility.
  • Method documentation (1%): Each method has a short description indicating its intended purpose and how it gets its job done. (These are also needed for your own helper methods you choose to add).
  • Proper Use of Generics (1%): The point of using Java generics is to get good compile-time checks and avoid runtime casting. Effective use of generics will mean very few runtime castes will be needed. To that end, runtime castes will be penalized unless there is no other way around them such as in the implementation of ArrayList. Unless otherwise noted, all classes required for HW do not require runtime castes.

6.5 (5%) Correct Project Setup

Correctly setting up the directory structure for each project greatly eases the task of grading lots of projects. Graders get cranky when strange directory structures or missing files are around and you do not want cranky graders. The following will be checked for setup on this project.

  • The Setup instructions were followed closely
  • The Project Directory is named according to the specification
  • There is an Identity Text File present with the required information in it
  • Code can be compiled and tests run from the command line

7 Setup and Submission

7.1 HW Directory

There is no code distribution for this assignment.

Create a directory named masonid-hwX where masonid is your mason ID. My mason ID is ckauffm2 so I would create the directory ckauffm2-hw4.

This is your HW directory. Everything concerning your assignment will go in this directory.

7.2 ID.txt

Create a text file in your HW directory called ID.txt which has identifying information in it. My ID.txt looks like.

Chris Kauffman
ckauffm2
G001234567

It contains my full name, my mason ID, and G# in it. The presence of a correct ID.txt helps immensely when grading lots of assignments.

7.3 Penalties

Make sure to

  • Set up your HW directory correctly
  • Include an ID.txt
  • Indent your code and make comments

Failure to do so may be penalized by a 5% deduction.

7.4 Submission: Blackboard

Do not e-mail the professor or TAs your code.

Create a ZIP file of your HW directory and submit it to the course blackboard page. Do not submit multiple files manually through blackboard as this makes it hard to unpack large numbers of assignments. Learn how to create a zip and submit only that file.

On Blackboard

  • Click on the Assignments section
  • Click on the HW1 link
  • Scroll down to "Attach a File"
  • Click "Browse My Computer"
  • Select you Zip file

You can resubmit to blackboard as many times as you like up to the deadline.


Author: Chris Kauffman and Richard Carver (kauffman@cs.gmu.edu)
Date: 2016-11-29 Tue 17:34