Introduction

Logging is an essential part of any modern Java application. Logs allow us to understand the runtime behaviour of our applications. They also help us capture the context when errors occur. Without logging debugging a production issue will often be quite tricky. Throughout my career, I have found specific logging patterns more valuable than others when on-call. And some - not so much. In this blog post, I will talk about a few of them with examples.

Context

All patterns or anti-patterns always have an associated context. In the following discussion, I assume a high-traffic web application is doing all the logging. By “high-traffic”, I mean a web application serving at least 30K requests per second.

1. Avoiding duplicate error logs while preserving the context

Consider the following code -

public SomeResult performSomeOperation(SomeData someData) {
    try {
        return someService.doSomething(someData);
    } catch (SomeCommonException ex) {
        log.error("Error occurred. Data - {}", someData, ex);
        throw new AnotherCommonException("Some error occurred");
    }
}

There are two reasons this pattern is not helpful when I am on call. First, it results in unnecessary noise in the production log. In this example, we log an error message immediately after catching the exception. But then we are throwing a new exception - AnotherCommonException. Now the client of “performSomeOperation” would need to deal with this exception -

public SomeResult clientOfPerformSomeOperation(SomeData someData) {
    try {
        return someService.performSomeOperation(someData);
    } catch (AnotherCommonException ex) {
        // log it? Throw a new error? Do both?
    }
}

In the worst case, each client will log an error message and rethrow the exception. As a result, we will end up with two or more log messages for a single error. If 1% of the incoming requests encounter SomeCommonException, then that would mean six hundred error logs per second being logged, three hundred of which are duplicates. That’s a lot of unnecessary error messages! On top of that, a 1% failure is often considered acceptable in a distributed application, in which case logging these “acceptable failures” might even mislead me to think that this might be the source of an active alert.

The second reason is that we have no link between the root cause and the follow-up exception. For a high-traffic application, the first and the second log message will be far apart in the production log. And chances of linking them together will be low. As a result, the production log will contain random exceptions without any context, which might also be a source of misdirection for me.

The on-call-friendlier version of the above example would be -

public SomeResult performSomeOperation(SomeData someData) {
    try {
        return someService.doSomething(someData);
    } catch (SomeCommonException ex) {
        throw new AnotherCommonException(
            "Error occurred. Data - {}" + someData,
            ex
        );
    }
}

This logging does not have the above problems. It does not create duplicate log messages. It also preserves the original error context. Identifying and filtering them out from the log is also relatively easy.

Having precisely one log message per error has other advantages. It makes it easy to count the exact number of errors at a particular time. If the exception is occurring due to a timed-out API call, then searching the log for ERROR levels within a, say, 30-minute time window will return the exact number of failed API calls within the period.

2. Logging “acceptable failures” with WARN level

Consider the following API call -

public String get() {
    try {
        return restTemplate.getForEntity(
            "https://httpstat.us/500",
            String.class
        );
    } catch (Exception ex) {
        log.error("Error occurred", ex);
        return "A default response";
    }
}

Making remote API calls 100% reliable is practically impossible. And so it’s expected that a certain percentage of this call will fail. For our high-traffic web application, a 0.5% failure rate here would mean 150 errors per second. Logging each one as an error would thus generate high-volume noise in the production log.

Rather than logging it as an error, I would rather recommend logging it as a WARN -

public String get() {
    try {
        return restTemplate.getForEntity(
            "https://httpstat.us/500",
            String.class
        );
    } catch (Exception ex) {
        log.warn(
            "API call failure. Context data should also be logged",
            ex
        );
        return "A default response";
    }
}

Logging these failures as WARN still tracks our API failures while at the same time making it easy to filter them out.

This example also demonstrates one of my logging heuristics - I always consider an ERROR log worthy of keeping me awake at night. If this is not the case, I will not log it as an ERROR.

3. Avoid logging validation failures as errors

Consider the following code:

@RestController
public class SomeController {
    
    @PostMapping
    public ResponseEntity<?> createOrder(OrderDTO dto) {
        var validationResult = validateDto(dto);
        if (validationResult.hasError()) {
            log.error("Validation error: {}",
                validationResult.toError());
            return ResponseEntity.badRequest()
                .build();
        }
        // create order logic
    }
}

Logging a validation error as an error in the log is 100% noise. Validation errors represent an incomplete request; the application cannot process the request in its current form. And so, logging these requests as errors only pollute the log. I recommend removing this logging altogether.

There is a rare exception to this. Often, clients would ask for a log of a particular request to debug their code. For example, if this API serves mobile apps and there is a logical bug in the validation on the mobile app, then this error logging might be helpful. In that case, we might consider keeping this log but with a different level -

@RestController
public class SomeController {
    
    @PostMapping
    public ResponseEntity<?> createOrder(OrderDTO dto) {
        var validationResult = validateDto(dto);
        if (validationResult.hasError()) {
            log.info("Validation error: {}",
                validationResult.toError());
            return ResponseEntity.badRequest()
                .build();
        }
        // create order logic
    }
}

This logging might be acceptable if the web application receives a low traffic volume. For a high-traffic website, this would, however, require a considerable amount of log storage. In such cases, it is always better to rely on an observability tool which can perform after-the-fact sampling (sampling 1% of the requests that generate 400).