Just moozing

Before you can check your notes, you must make them…

TDD python programming example

leave a comment »

Continuing the work on my archiving system, we have now come to the point where the actual programming is done. Last week, I discussed what I wanted to program and how I got organized. This blog entry is about programming and unit testing.

Unit testing setup

I have a created a script called RunTests.py, that I use as the default script to run. Geany default to executing the current file, and in the project settings I have specified that I want to run python RunTests.py instead. The script is located in the project root directory.

The code for RunTest.py is shown below:

# RunTests.py
# top level 'run-all' script

# include program sourcecode in search path
import sys

import unittest
from tests.testProcessFile import testProcessFile

if __name__ == '__main__':
    suite = unittest.TestLoader().loadTestsFromTestCase(testProcessFile)

Lines 5 and 6 is included to circumvent issues related to directories and python search paths. This also means that the unit test scripts cannot be run by themselves (in their current form).

The rest of the script is about using testProcessFile class for unit testing.

Implementing ProcessFile

Last week, I showed two activity diagrams. I start with ProcessFile, since it is the simplest and easy to test. Besides, ProcessDirectory relies on it, and to circumvent that would be extra work.

I have done the tests, seen them fail, and then corrected the test (yes, I do make the occational error in my tests) or ProcessFile to make it pass. The tests looks as follows (click on the “view source” button for a text view):

# imports
import unittest
from ProcessFile.ProcessFile import ProcessFile

# test data
TestBaseDir = './tests/data/ProcessFile/'
TestFileA = TestBaseDir + 'ExistingTestFile.txt'    # file with some blabla content
TestFileA_md5 = 'bb1fc7a8eef4e86a76f9ccd65e603f51'  # md5 calculated using "md5sum" command
NonexistingTestFile = TestBaseDir + 'NonexistingTestFile.txt' # non-existent, but in existing dir
TestListWithTestFileA = [ {'md5': TestFileA_md5, 'Filename': TestFileA } ]
TestListWithoutTestFileA = [ {'md5': 'VeryBadMd5Sum', 'Filename': NonexistingTestFile } ]

# tests
class testProcessFile(unittest.TestCase):
    def testExistingFile( self ):
        """ Simple function run with existing file"""
        ProcessFile( TestFileA )

    def testNonExistingFile( self ):
        """ Simple function run with bad filename. Raises IOError exception"""
        self.assertRaises( IOError, ProcessFile, NonexistingTestFile )

    def testVerifyMD5sum( self ):
        """ Check MD5 result against known MD5"""
        Res = ProcessFile( TestFileA )
        self.assertEqual( Res['md5'], TestFileA_md5 )

    def testIsInList( self ):
        """ Check if file is found in supplied list"""
        Res = ProcessFile( TestFileA, TestListWithTestFileA )
        self.assertEqual( Res['duplicates'], [ 0 ] )

    def testIsNotInList( self ):
        """ Check if file is not found in supplied list"""
        Res = ProcessFile( TestFileA, TestListWithoutTestFileA )
        self.assertEqual( Res['duplicates'], [ ] )

I have done five tests. That is sufficient for me to be fairly certain that ProcessFile functions as intended. I have used the formula of making a positive test (is it working as intended?) and a negative test (is it failing too?). I ought to include a test for a badly formed list and, if I were a TDD purist, I would have included way more tests. This level of tests is enough to know that we have something that is working and that we will catch most programming errors. Besides, when I start coding the ProcessDirectory part, I will make error that break ProcessFile and solving that will include making more tests of ProcessFile.

When doing tests, an important point is the test data. I find making proper tests data (both data on disc and in variables) to be very non-trivial. If your use data sets for your tests, your tests cannot be better than the test data. Exceptions and rare occurrences can only be properly handled if your know that they will occur, i.e. have an example in the test data (live data is preferred, but generated data is useful too).

Sanitizing data is also important. It is important for data that goes into databases, but depending on your module, you must perform certain input verification of the data the the user (or programmer) supply to you module. The test data ought to cover the extremities of input (i.e. string is Null or very, very long), but that has been omitted in the example too.

Implementing ProcessDir

A quick statistics that I noticed; tests have 83 lines of code (plus testdata), source file has 53 lines of code. This could seem as being too much tests and too little code. In this blog entry, he says something like 60-80% of the code should be test code – I’m currently at 61%, and I am not doing that many tests.

The tests for ProcessDir are as follows (test data has been omitted)

# tests
class testProcessDir(unittest.TestCase):
     def setUp( self ):
         # Running a script to init data.
         os.system( 'sh ./tests/ProcessDirDataInit.sh' )

     def tearDown( self):
         # run script to undo stuff in setUp
         os.system( 'sh ./tests/ProcessDirDataCleanup.sh' )

     def testExistingDir( self ):
         """ Simple function run with existing directory"""
         ProcessDir( TestDirA )

     def testNonExistingDir( self ):
         """ Simple function run with non-existing directory"""
         self.assertRaises( IOError, ProcessDir, NonexistentDir )

     def testSingleDepth( self ):
         """ Check return value of single depth dir tree """
         self.assertEqual( ProcessDir( TestDirA ), TestDirA_list )

     def testRecursive( self ):
         """ Check return value of multilevel dir tree """
         self.assertEqual( ProcessDir( TestDirB ), TestDirB_list )

     def testFindDuplicates( self ):
         """ Check to find duplicate files (based on md5)"""
         self.assertEqual( ProcessDir( TestDirB, TestDirA_list ), TestDirB_duppA_list )

     def testProgressIndicator( self ):
         """ check that all data is piped to progress indicator also """
         Res = ProcessDir( TestDirA, ProgressFunction = SaveOutputToList )
         self.assertEqual( Res, SavedList )

I have done 6 tests, but it was the tests data that was the time consuming part. In the test case I have included setUp() and tearDown() (on lines 51 and 54). These two a special functions that relate to the test fixture – setUp() is called before the first tests and tearDown() is called after the last test. Here they are used to initialize (through a shell script) the test data. It was done, because when recursion was enabled, the test include .svn directories and resulted in failed tests. Since I cannot remove them (as they are part of the version control system) and I want version control on my test data also, I opted for the solution where the test data was copied to a temporary location (and removed after use).

One of the really strong points is the fact that you choose the parameters and the output format of the function before coding. It make the test data serve as reference, but also helps to keep a clean API. When the classes start to get big and complex, having a clean interface is an important feature. TDD helps keeping a good level of cohesion (at least on the function level) and testable code has a tendency of being more loosely coupled. See this blog for more of the same topic.

User script

Having implemented both ProcessFile and ProcessDir, I am now able to put the two together and make the top level script the the user is to use. Since all the processing is in the modules, the script is fairly easy. Command line parameters are implemented using the standard python library optparser and the rest is output and piping the correct directories to ProcessDir.

As I tested the program, I was reminded of one of the points that I mentioned earlier. The program is not better than the test data. The current implementation cannot handle links. They are read as an entry in the directories, but they cannot be opened, so the program breaks with IOError exception. Next, I will create some test data and a test that recreate the error, and then fix and rerun the test.

I wanted to link to the code, but I am not allowed to link to .tgz files, so I need to find another way. Perhaps I should find some code hosting service, but that will have to wait for a later time.


Written by moozing

July 19, 2010 at 09:00

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: