Cleaning up after your code is any activity you do to erase the trace of your code execution. Usually you do this to prevent a side effect that will affect execution of another piece of code later.
Side effect? Aren’t we supposed to minimize those?
Yes, we are. So cleaning up could be a sign you are not doing something right. But sometimes you do not have a lot of choice.
Let’s give a few examples. Frequently I have to clean up databases from data I insert there, but is no longer needed or could clash with other data inserted later. Another case is deleting temporary files from scripts executed on remote servers. Like when server is set up, application is deployed or configured, etc.
We often clean up after our code and this is often seen as a good thing.
In many cases this is only a habit, there is no real need for it and it should not be done.
Plan for failure
What if things do not go according to plan? When your code fails, what do you do? At that moment you will want all the evidence possible, but you just cleaned it away. No matter how hard we try to prevent it, things quite often do not go according to plan, so we need to plan for that.
Unless you absolutely must do it, leave all traces of your code behind.
Strive to reduce changing and deleting anything as much as possible.
Be prepared when the inevitable bug shows its ugly head from whatever corner of your code it was hiding until now.
What if I need to clean something?
There are cases where your code will require a clean state for it to run.
First ask yourself why are you cleaning after your code? Can you change this? Quite often this is possible with little effort.
If you absolutely need clean state anyway, than go and ensure things are clean for your code to run.
The difference is this time ensure the state is clean just before you execute the code that requires the clean state. Before, not after. You know what the code requires and where, so it shouldn’t be hard to do.
My favorite example is integration test code that touches the database. It seems straightforward, but frequently I see this in it:
- Not generating proper test data
- Using the same database for development and testing purposes
- Working with transaction you do not plan to commit
- Deleting all test produced data after test is done
What are the issues with these and some proposed solutions:
Not generating proper test data:
Test data should be as close to real as possible. Where something should be unique – make it so and avoid storing just values of
'testname', 'testaddress', 1, 0, 'test', 'test'. Append random characters to your strings and take some effort to randomize other values as well. And important note – make sure its random and not
Using the same database for development and testing purposes:Frequently we run applications on our own develop machines. For that we may need database up and running. Take time to set up two databases, so your tests will not interfere with your “developer” database, destroying what you have set up there. Destroy your test db when you do not need it.
Working with transaction you do not plan to commit:
Usually connected with the previous point. If you do not commit, you will not pollute your development database after all. Store your data. Unless you run your tests in an in-memory db that does not plan to stick around 🙂
Deleting all test produced data after test is done:
You are not deleting your production data, aren’t you? Save and store your test data as well. When test fails, you will have all the evidence. This could be a lot of work, but at least clean your data before your tests, so you ensure state for each is clean. Otherwise you will have to comment all “cleaning” code and rerun test when it fails.
It is not only for db integration tests
It was a matter of time before I started applying the same concept in other areas.
At work I have to care about our little server garden (Cool, I am a Server Gardener!) For that we have developed a lot of little scripts and stored them in a software repository.
Developing deployment and installation scripts for our servers was a slower process compared to running code in a code editor. Testing is harder and, as a result, feedback is slow.
Often when I had an issue with a script I had to look for an information that was not there anymore. So I tried to apply the same principles:
- Stopped sharing folders between scripts
- Avoided moving and deleting files, just copied what I needed
- Strived to make all scripts idempotent – using tools like ansible that embrace this helps a lot
Following these really helped and my Server Gardener career is going well as a result.
- When you clean up after your code, stop and think if this is necessary
- Avoid sharing state and side effects, so you never had to clean anything
- Plan for failure, do not destroy evidence
- Strive to make your code idempotent
Thank you for reading.
If you would like to receive future post updates, please subscribe below.