Programming Project 2

This semester you will be writing a program that models elements of recognizing and creating characters. Optical character recognition is an important area of research that allows photographs or printed documents to be digitized; by doing so, these documents are made available for machine-based searching. On the flip side, http://en.wikipedia.org/wiki/CAPTCHA is a system for differentiating between humans and computers: the goal here is to generate a non-machine readable image that a human could identify. CAPTCHA helps reduce the amount of spam on the Internet.

We will implement a highly limited type of image matching, processing, and creation this semester. Rather than write this project at once, we will break the project down into several two-week sub-projects that are due throughout the semester. The rest of this document will detail the first such assignment.

This first assignment will ask you to write some code that will help you determine if a pixel is part of a number. You will complete two functions available in the template file for project2_template.py. You should save that template file under the name project2.py.

You will have to do three things for this assignment:

Step 1: Writing Test Cases
Ideally, you should write your test cases before you write your code. In doing so, your test cases will all initially fail (because you haven't written any code), but that's fine. So, next we're going to explain what the two functions above are supposed to do, and then we'll explain how to format your test cases so you can get credit for them on Marmoset, and how you can run these test cases on code you will write.

First, we re going to write a function to check when a pixel is inside the image of a number. Since we want to keep this project simple, we'll use very small 3x4 images, and think of them as grids. We'll say that a grid is made up of tiles. The checkTile function takes as arguments, or already knows the values of, the tile number, and the number itself, which is a string representing the numbers 1 through 5. In general the tiles of an image are always numbered, starting at 1. On our 3x4 grids, tiles would be labeled as:
 1   2   3 
 4   5   6 
 7   8   9 
 10  11  12

We will define the five images of the numbers "one" through "five" as the following:

"one"
           X 
         X 
         X 
         X 


"two"
 X   X   X 
     X     
 X         
 X   X   X 


"three"
 X   X   X 
     X   X 
         X 
 X   X   X 


"four"
 X       X 
 X   X   X 
         X 
         X 


"five"
 X   X   X 
 X   X   X 
         X 
 X   X   X 

The X characters indicate when that tile would be occupied by a colored pixel (as opposed to a white pixel). Your function should determine if the tile tile is a non-white pixel in the number, and return either True or False (as boolean values), depending on the arguments passed in. For example, a call to the function with checkTile(5,"four") would return the value True. A call to checkTile(7,"four") would return the value False. The function will return True when the specified tile occupies a colored pixel in a 3x4 image of the numbers one through five as specified above; in all other instances, it will return False. You may assume only the numbers "one" through "five" will ever be called, and that all tile inputs are integers.

IMPORTANT: To make this problem more reasonable than just using a "brute force" algorithm (which would require more code and potentially take much longer to write), and to practice some of the boolean logic constructs we've discussed, your code should not use more than four boolean operators with any conditional statement. For example, the following conditional:
if ((A and B) or ((C or D) and E)):
uses the maximum four boolean operators allowed (and, or, or, and).

This conditional conditional:
if (A and B or (C or D and E and F)):
uses five boolean operators, and is disallowed.

Remember that simply putting operators on different lines in python (we have not learned how to do this, but it is possible), does NOT count as reducing the number of operators, as your conditional still requires the same amount of operators. In addition, your function should be less than 25 lines of code, otherwise it's easy to just brute-force the solution.
For the second function, we're going to write a very specific function to identify empty pixels for just the number three. The isEmptyOnThree takes as arguments a tile and a width and returns True or False depending on if the specified tile number is empty (non-colored) on an image with a height of four, and a width as specified by the argument.

How do you know what other shapes of three look like? Here are some rules:
  • The shape will always be "three"
  • The height will always be 4
  • The top and bottom rows will always contain all colored pixels
  • The second row will have all white pixels left of the half of the row, and all colored pixels right of the half of the row. If the width is an uneven number, the middle pixel will also be colored.
  • The third row will have all white pixels, except the last pixel on the right

For example, isEmptyOnThree(8,3) would return True. You may assume the arguments to the function will always be of integer type (so no need to test other types), but your code should return invalid for all invalid inputs, which includes, but is not limited to, the width being less than 3. Your code must handle all integer inputs for both arguments without crashing; it must always return True, False, or invalid.

Now you should be ready to write test cases for your code, once you understand what arguments each function is expecting, and what it will return and when. For this project, we will expect your test cases to be written to a file called tests.txt (right mouse click and save the file - do not try to click on it in your browser and copy the contents - you will miss the newline at the end). The example shows you the format for one test case of checkTile and one test case of isEmptyOnThree. Each test case is two lines long: the first line contains the name of the function, followed by a space, followed by all of the arguments to the function, each separated by a single space, and terminated by a newline. Then, on the next line, you should provide the expected answer for that function call, terminated by a newline. In your tests, make sure you include spaces where they need to be, otherwise they will not pass on Marmoset. Make sure to include a newline (blank line) after your last test case.

You should write at least 50 cases, and up to 300, for this project. This should not take very long, as each test case is just two lines long.

Step 2: Writing Code
Once you have written your test cases, you can begin to write your code for the project in the file project2.py (and remember the link to the template for this file above).

This problem is mathematically very simple, and can be solved with just addition/subtraction, multiplication/division, and modulus, and if-else statements.

Where do you start writing your code? Pseudocode! First, you will want to think of some mathematical formulas or rules for determining when a tile is part of a shape, and when a tile is on a certain row or column.

You will also have to use if-else statements. You will need the return statement to get your functions to return a value - DO NOT use the print statement for this!


Step 3: Testing Your Code on Your Test Suite

Please log in again to view this part of the page.



Step 4: Testing Your Code on Our Test Suite
Once you've written your code and tested it on your test suite, check to see that it passes our release_tests.txt by saving that file to the same directory as your other files, and choosing y when you run the driver.
Project Hints and Guidelines

Remember, when designing your own test cases, try to do so in a thoughtful and structured approach, as we have done in class. What is the smallest possible board you could call your functions on? What is the next smallest one? What are all the corner cases?