Getting the data out of the tests

Hello everybody!

A few days ago, while I was searching for ways to improve the tests in our projects, I found an interesting article with some ideas. The main concept was to make the tests behavior, data and actual implementation separated. The original article can be found here. In this post, I’ll talk about the first of those things we decided to implement here: getting the data separated from the tests implementations.

First, if the data is not to be inside the test implementation, it ought to be somewhere. Where should it be? The answer is anywhere you decide it is easy to change. Better yet if it can be changed by non-programmers as well. In our case (and in the mentioned article’s example) we are using Excel spreadsheets.

Here is an example of how this looks like:

Data in an excel spreadsheet

It is most likely that almost anyone can edit this file. So if you have someone that is not involved in programming (like a client), this person should be able to edit this spreadsheet pretty easily.

After having the data ready, you need to access it somehow. To do this, we are using the Apache POI project, which makes accessing and reading this file (and any other MS Office files) pretty easy. The code bellow would read all cells in the spreadsheet and print them out.

HSSFWorkbook wb = new HSSFWorkbook(new FileInputStream(fileName));
HSSFSheet sheet = wb.getSheet(sheetName);

for (Iterator rit = sheet.rowIterator(); rit.hasNext(); ) {
  HSSFRow row = rit.next();
  for (Iterator cit = row.cellIterator(); cit.hasNext(); ) {
    HSSFCell cell = cit.next();
    System.out.println(cell.toString() + " ");
  }
  System.out.println();
}

Now, if you want to know if the correct user listing is being returned from some business logic implementation, you would only need to change that code to compare the results read from the files with the ones returned from your business class. The business rules changed? Change the file and the new expectation will be in place, without even having to touch the test code, unless of course it is a change in the structure of the information.

The next problem that my arise is that you probably have a LOT of tests. Or at least you should have… Anyway, having a spreadsheet for each one would be suicide. Tons of files to handle! So what we can do is to create one spreadsheet per test class. Inside the file, we create one sheet per test. The footer of the spreadsheet then looks like this:

Multiple sheets in a spreadsheet

And that’s it! What do you think? Any ideas on how to improve this even more? Don’t be shy and post a comment!

This entry was posted in agile and tagged agile, spreadsheet, tdd, test. Bookmark the permalink.

2 Responses to Getting the data out of the tests

Alisson "The_Linux_Lich" says:

31 Oct 2008 at 1:57 am

I think this is really too much “mental masturbation” to generate test data. It reminds me of the Rails’ old days(back 1~2 years ago) when we were using YAML fixtures and some autotest magic(a similar approach, less headache, but still boring – have a look at http://manuals.rubyonrails.com/read/chapter/26)

Sticking to this technique, there is Fitnesse (http://fitnesse.org/)

But it seems so awkward, as the system grows, the test data(and its associations) gets bloater. I recommend you know a new kid on the block, based on TDD:

Behavior-Driven Development
http://behaviour-driven.org/

BDD with user stories really kicks ass! It diminishes the communication barrier between client and dev, take a look at easyb, a BDD framework for Java, inspired on Rspec: http://www.easyb.org

Or Cucumber(this one is awesome!):
http://github.com/aslakhellesoy/cucumber/wikis/home

Also, take a look at factory_girl concept to generate test data on the fly:
http://giantrobots.thoughtbot.com/2008/6/6/waiting-for-a-factory-girl

Wow, I should create a blog to write about this 😀

Paulo Renato says:

31 Oct 2008 at 12:13 pm

Creating the test data is one of the most boring things we have to do. But it can’t be avoided.

Personally, I find BDD just one more buzz word – there is too much hype around it. It isn’t that much more than TDD after all. The best point of it is what drive us to the “domain specific language” concept – i.e. tests written in natural or “almost” natural language, so to speak. Of course, I haven’t studied it too much, so may be I’m underestimating it.

Also, it isn’t something you can insert out of the blue in an ongoing project. You must go through a thinking and experimenting phase, which isn’t easy when you have a tight schedule on you, since it consumes time. The approach I suggest in this post is one that can be used really quickly in almost any project. It’s fast first step.

Anyway, this leads us to another of the ideas pointed in the original article I mentioned: getting the “Intention” of the tests out of the implementation. We should also do this at some point. I just don’t know when. We have a list of things to improved here =)

Also, I’m not that much inclined to Fitnesse – it seems to be cumbersome to integrate in our build. But of course his can be false for other projects.

By the way, you just gave an idea for a future post, about tools. Wait for something in the next few days! =)

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31