Remember your first programs? Your first Hello World? You will launch them from the command line and just print something. Over time your programs became more complex. You still printed a lot of stuff – at the beginning of you program, when your program was finished, parameters invoking a function that you think it is important, etc. You are doing this, because you are running your program a lot trying to get feedback on how it is working.
Eventually you discovered logging tools – they are cool and let you log more easily, give you exactly when the code was invoked and from where, in a similar format. So you start using those as well and stop relying on
System.out.println and the likes. They also often let you record in a file everything, so you can check it later.
Years pass and you still write logs when you start developing a program. You want to get a sense of what your program is doing and debugging is sometimes hard, so you will just add a log here and there.
You watch logs to check the flow of your program.
Then there is a time when you are done working on the code and it is time to commit into your source repository. What do you do before that? Often you remove all that extra logging, or make sure it is disabled. It has done with purpose, so it is no longer needed. Or is it?
All our servers belong to you. You care for them, install stuff on them and make sure they help your business operate. Without you, everything will soon fail.
Most of the time everything is fine, but sometimes there are issues and when you have issues, you need to fix them. What do you do then?
You watch logs to find what went wrong.
Sometimes logs will tell you exactly what happened, you will restart a server or application and problem will go away.
Other times a programmer will put too much noise in the application logs, recording all kind of useless stuff and making your job harder. You will use your skills to filter logs for errors and hope programmers that developed the products have the same understanding as you of what an error is, what should be an warning and info message.
And of course sometimes there will be nothing in the logs and you will wonder without guidance, trying this and that to see what works and what not in the dark.
There is this weird bug in production, you need to fix it. Your operations colleague have an outline of what happened in his opinion and extracted log data for that day. He complained the log file is a bit big. You try to narrow the problem to a specific place in code.
You watch logs to find the code that got executed.
Thankfully all logs contain the name of the class that invoked them. Of course it is not all good, as the particular line of code that initializes the logging in that class was copy-pasted without modifications. That useful information not always relevant as result. After a short struggle you manage to find the few places that log this message and finally narrow down the problem code and produce a fix.
Your small team believes that whoever builds it, need to maintain it. You put effort in monitoring your applications in production and part of it is storing at logs.
They say what you measure you will also strive to improve. And a funny thing happened. Your code is working, users are not complaining, yet you see some errors in the logs. Do you have a problem?
Your team members are smart, they value quality and take pride of doing the right thing. You use test automation, test driven development, code reviews, all the best practices.
Nothing will ever prepare your application code for the real world. You will always miss something. Edge cases will go unnoticed until a user will use your application in an unexpected way.
Your code will often recover, some times inconvenience is small, sometimes user will just not bother complaining at all.
But the bug will be there, hide in the corner and wait for the perfect moment to strike.
You watch logs so you can improve your code quality.
Squash the bugs before they grow bigger.
If programmers job is to write code, why are we spending so much time reading it? You look at this code for some time now and try to understand what is going on there. There is exception handling here, but what is it for? Oh, there is a log, let’s look at the message! “Cannot read orders from external system”, got it.
You look at where you log to understand what code is doing.
Log code is a form of documentation. No needs for comments when you have proper logs.
You learn that you can take micro services too far the hard way. Your team is good and every single service is high quality and well tested. There are issues with integration and communication between all these services. Every bug will often become a murder investigation between many places. On which instance of 3 identical servers the problem happened? How do you find which service called the problem server. Everything is distributed!
So you have set up this log collection service to help you do just that. Its great and helps.
But it is not enough.
You have just done a great job. That external system you need to connect to was throwing errors at you. Your logs where filled with “You are not authorized” errors. You fixed it!
What if it happens again?
You need to watch your logs. You want to know when something happens.
You need alarms.
You watch your logs to be sure a problem stays fixed.
So you set up an alarm, that will watch for “You are not authorized” and send your team an email when this happens. Now, if it happens again, you will know.
Why we log?
To achieve great results, you need to know why you do something. Why we need to look and watch our logs files? I think the answer to this question will help us do better job.
Here are all answers again, gathered from the stories above:
- You watch log files to check the flow of your program.
- You watch logs to find what went wrong.
- You watch logs to find the code that got executed.
- You watch logs so you can improve your code quality.
- You look at where you log to understand what code is doing.
- You watch your logs to be sure a problem stays fixed.
What about performance?
Logging could be expensive. But is really the issue in writing to logs, or somewhere else? Before you fix anything – measure and find your bottleneck.
Even if logs will slow you down, what choice do you have?
It is not good to be faster, but blind.
How do you use your logs?
Thank you for reading.
If you would like to receive future post updates, please subscribe below, I post once per week.