Programming Project 2

This semester you will be writing a program that models elements of recognizing and creating characters. Optical character recognition is an important area of research that allows photographs or printed documents to be digitized; by doing so, these documents are made available for machine-based searching. On the flip side, http://en.wikipedia.org/wiki/CAPTCHA is a system for differentiating between humans and computers: the goal here is to generate a non-machine readable image that a human could identify. CAPTCHA helps reduce the amount of spam on the Internet.

We will implement a highly limited type of image matching, processing, and creation this semester. Rather than write this project at once, we will break the project down into several two-week sub-projects that are due throughout the semester. The rest of this document will detail the first such assignment.

PROJECT UPDATES AND CLARIFICATIONS

This first assignment will ask you to write some code that will help you determine if a pixel is part of a number. You will complete two functions available in the template file for project2_template.py. You should save that template file under the name project2.py.

You will have to do three things for this assignment:

Step 1: Writing Test Cases
Ideally, you should write your test cases before you write your code. In doing so, your test cases will all initially fail (because you haven't written any code), but that's fine. So, next we're going to explain what the two functions above are supposed to do, and then we'll explain how to format your test cases so you can get credit for them on Marmoset, and how you can run these test cases on code you will write.

First, we re going to write a function to check when a pixel is inside the image of a number. Since we want to keep this project simple, we'll use very small 3x4 images, and think of them as grids. We'll say that a grid is made up of tiles. The checkTile function takes as arguments, or already knows the values of, the tile number, and the number itself, which is a string representing the numbers 1 through 5. In general the tiles of an image are always numbered, starting at 1. On our 3x4 grids, tiles would be labeled as:
 1   2   3 
 4   5   6 
 7   8   9 
 10  11  12

We will define the five images of the numbers "one" through "five" as the following:

"one"
           X 
         X 
         X 
         X 


"two"
 X   X   X 
     X     
 X         
 X   X   X 


"three"
 X   X   X 
     X   X 
         X 
 X   X   X 


"four"
 X       X 
 X   X   X 
         X 
         X 


"five"
 X   X   X 
 X   X   X 
         X 
 X   X   X 

The X characters indicate when that tile would be occupied by a colored pixel (as opposed to a white pixel). Your function should determine if the tile tile is a non-white pixel in the number, and return either True or False (as boolean values), depending on the arguments passed in. For example, a call to the function with checkTile(5,"four") would return the value True. A call to checkTile(7,"four") would return the value False. The function will return True when the specified tile occupies a colored pixel in a 3x4 image of the numbers one through five as specified above; in all other instances, it will return False. You may assume only the numbers "one" through "five" will ever be called, and that all tile inputs are integers.

IMPORTANT: To make this problem more reasonable than just using a "brute force" algorithm (which would require more code and potentially take much longer to write), and to practice some of the boolean logic constructs we've discussed, your code must not use more than four boolean operators with any conditional statement; otherwise, you will not receive credit for your function, even if it passed test cases on Marmoset. For example, the following conditional:
if ((A and B) or ((C or D) and E)):
uses the maximum four boolean operators allowed (and, or, or, and).

This conditional conditional:
if (A and B or (C or D and E and F)):
uses five boolean operators, and is disallowed (would NOT earn credit).

Remember that simply putting operators on different lines in python (we have not learned how to do this, but it is possible), does NOT count as reducing the number of operators, as your conditional still requires the same amount of operators. In addition, your function must be less than 25 lines of code, otherwise it's easy to just brute-force the solution.
For the second function, we're going to write a very specific function to identify empty pixels for just the number three. The isEmptyOnThree takes as arguments a tile and a width and returns True or False depending on if the specified tile number is empty (non-colored) on an image with a height of four, and a width as specified by the argument.

How do you know what other shapes of three look like? Here are some rules:
  • The shape will always be "three"
  • The height will always be 4
  • The top and bottom rows will always contain all colored pixels
  • The second row will have all white pixels left of the half of the row, and all colored pixels right of the half of the row. If the width is an uneven number, the middle pixel will also be colored.
  • The third row will have all white pixels, except the last pixel on the right

For example, isEmptyOnThree(8,3) would return True. You may assume the arguments to the function will always be of integer type (so no need to test other types), but your code should return invalid for all invalid inputs, which includes, but is not limited to, the width being less than 3. Your code must handle all integer inputs for both arguments without crashing; it must always return True, False, or invalid.

Now you should be ready to write test cases for your code, once you understand what arguments each function is expecting, and what it will return and when. For this project, we will expect your test cases to be written to a file called tests.txt (right mouse click and save the file - do not try to click on it in your browser and copy the contents - you will miss the newline at the end). The example shows you the format for one test case of checkTile and one test case of isEmptyOnThree. Each test case is two lines long: the first line contains the name of the function, followed by a space, followed by all of the arguments to the function, each separated by a single space, and terminated by a newline. Then, on the next line, you should provide the expected answer for that function call, terminated by a newline. In your tests, make sure you include spaces where they need to be, otherwise they will not pass on Marmoset. Make sure to include a newline (blank line) after your last test case.

You should write at least 50 cases, and up to 300, for this project. You can write more than 300 but Marmoset will only grade the first 300). This should not take very long, as each test case is just two lines long. Your test cases will be graded on Marmoset and are due a week before the project due date. When you are satisfied you have enough quality test cases, you will have to convert your tests.txt file to something that Marmoset will understand. We have provided a file for you, called DriverBuild.class (right-mouse click and save this file into the same directory as all of your other files for this project). To use this file, from a terminal in that directory, type:

java DriverBuild tests.txt

This will create a file called DriverJava.java that you will submit as your test cases for this project. Note that you do not need to know or worry about how DriverBuild.class works (you will be learning about Java in CS211 if you take it), except each time you type the commands above, it will overwrite any older version of DriverJava.java in that directory. Note that this is a delicate process, and if your test cases are malformed in any way, the DriverBuild program will not be able to correctly convert them to something that Marmoset understands. If you see any error messages from Marmoset regarding "compilation", this means that your test file was not properly formatted, so you need to go back, fix the format, re-run DriverBuild, and resubmit to Marmoset. Marmoset will grade the quality of your test suite.

See the submission instructions at the bottom of this page for how to submit your test cases. Because each test (public and private) must run your entire test suite, this will be a slow process (it took over a minute for me to grade my test suite on Marmoset, when no one else was using the system). Do not wait until the last minute to do this part of the assignment!
Step 2: Writing Code
Once you have written your test cases (and submitted just the test cases to Marmoset), you can begin to write your code for the project in the file project2.py (and remember the link to the template for this file above). You should get started writing your python code as soon as all of you finish your test suite and it passes both the public and release tests on Marmoset -- do not wait!

You must come up with your own formula for the two functions. This problem is mathematically very simple, and can be solved with just addition/subtraction, multiplication/division, and modulus, and if-else statements. You will need to derive this formula on your own; discussing it with other students (including Piazza) is considered an Honor Code violation.

Where do you start writing your code? Pseudocode! First, you will want to think of some mathematical formulas or rules for determining when a tile is part of a shape, and when a tile is on a certain row or column.

You will also have to use if-else statements. You will need the return statement to get your functions to return a value - DO NOT use the print statement for this!


Step 3: Testing Your Code on Your Test Suite
Once you have finished writing your python code, you will first want to test your python code with the test suite you wrote, before testing your python code on Marmoset. Testing your python code at home will give you instant results, while you'll have to wait a bit to test it on Marmoset. Ideally, your test suite is well written, and it's testing for almost everything (or even everything!) the release tests for the python code on Marmoset are checking too. To help you use your test cases, I have included a driver.py file that will use your tests.txt file and print out whether or not the tests you wrote passed or failed, on the python code you wrote. To use this driver, make sure your project2.py, tests.txt, and driver.py are all in the same directory, and from a terminal in that directory, type:

python driver.py

If you want this driver to stop running after a certain test (so you don't have to scroll through everything when you debug), simply put the word stop on a single line after the last test you want run.


Step 4: Submitting Your Code to Marmoset.
Once your code passes all your tests at home, you're ready to submit to Marmoset. See the instructions at the bottom of this page for Marmoset submission.

In an effort to get students to do their testing at home using their test suite, rather than on Marmoset, you will need to comment out all of your debug print statements from your code before submitting to Marmoset. Do not delete your debugging statements (you may need them later); just comment them out using #. Note that you DO NOT have to comment out your print statements to run your test suite at home using driver.py. Marmoset submission should be a last and rare step!


Sample Input and Output

Your python code will be tested on Marmoset in the same manner as your test cases. Two sample tests have been provided in the tests.txt above: these tests are the Public Tests on Marmoset.

Release tests for the coding portion will be made available the Thursday morning before projects are due. Coding projects are used as both learning and assessment tools, and in order to get students to test their own code, and come up with their own solutions in a timely manner, the professor or TAs will not answer questions about release tests until they are publicly posted. However, we are very happy to answer questions about why your code doesn't pass *your* tests, at any time. Please get started on projects early, and write and use a high-quality test suite of your own, so you pass most or all of the release tests before the Thursday the project is due.
Project Hints and Guidelines

Remember, when designing your own test cases, try to do so in a thoughtful and structured approach, as we have done in class. What is the smallest possible board you could call your functions on? What is the next smallest one? What are all the corner cases?

Other hints and guidelines:
Project Grading

The project will be worth 130 points:

Project Submission

Do not submit the same files multiple times to try and get Marmoset to grade them faster (because it works like this). You will only slow down the results for yourself and the rest of the class. All requests are handled in order.

There are two due dates for the two different parts of the project (test cases and code). Once a due date passes you cannot resubmit that part of the assignment. However, you may submit either or part of both assignments before the first due date, if you are done early.

DUE DATE 1: Thursday 2/11/2016 at 4:55pm: Test cases due. Submit ONLY your DriverJava.java file (see above for how to convert your tests.txt file into this DriverJava.java file) on Marmoset following the link to CS112-2T. (The T stands for TEST)

DUE DATE 2: Thursday 2/18/2016 at 4:55pm: Code is due. In order to have your code work with Marmoset, you must also submit two files code.py and SystemCall.java. Again, you do not need to know how these files work (or even open them); just make sure not to change them, and submit them with your project2.py. Normally, Marmoset is set up to work with Java files, which is why we have this workaround. Submit ONLY your project2.py, code.py and SystemCall.java files on Marmoset following the link to CS112-2C. (the C stands for CODE). Find the link on Marmoset to submit Project 2. Once you pass all the public tests, use your tokens wisely to start examining the release tests. Do not change the name of the files.

You may make as many submissions to Marmoset as you like, before the due date, but we will only grade the highest score. Remember to read and adhere to all of the information regarding projects and their submission and grading on the course syllabus. Remember that Marmoset can be slow due to heavy load, and under no circumstances will project due dates be extended because of this: get stared early.
Allowable resources: Class textbook, slides and templates from class, and Lab Instructors. You may not use Python syntax you are familiar with unless it appears on the slides, templates, or course exercises. You may not look at or share other students' code in any manner. You may not look at or share test cases with other students. You may NOT work together or talk to other people (including outside sources besides the professor and GTA/UTAs) about the project. All work must be your own.