Wednesday, May 31, 2006

Data comparison? Python file comparison to the rescue!

I had to update some data in a production database for a new functionality. I had already done that in the test database during development, but it was not just a simple insert. The steps that I did went something like this:

1. Generate a file with a set of insert statements (used Python for this!).
2. Manually change some data in the insert statements.
3. Run the insert statements

Now since I had to do the same in production, I wanted to make sure that after this insert, the data would look exactly the same as it did now in test. In other words, I wanted to be able to repeatedly run the process above, and make sure that the steps produced always the same set of data. This would guard against any mistakes being done during step 2, which is a manual edit of a file and prone to error.
Sounds simple - I could generate the file of insert statements, and then compare it against the file that was already changed and used to load the data in the table. Problem is, I no longer had that file I could compare against. So instead of comparing insert statements, I would have to compare data. The process I did went something like this:

1. Export current data in the table to a file, say file1.txt
2. Delete the inserted rows from the table.
3. Generate insert statements again, manually edit, and insert rows into the table.
4. Export data in the table to a file, say file2.txt

Now all I had to do, was compare file1.txt and file2.txt. If they are equivalent, it means my manual edits were on target.

So how did I compare the two files? Python to the rescue! This is all it took:

import filecmp
filecmp.cmp('file1.txt', 'file2.txt')

If they are equal it returns true, else false.

I have to admit that that the only reason I thought about comparing data rather than the insert statements was because I was sure Python would have something neat for file comparison. The database-oriented guys might have fired up some magic by using stored procedures and temporary tables, and it would have been equally satisfying to them.

Its all a matter of what you are comfortable with, I guess.

No comments: