Don't Leave Logging for Later: Lessons From 1M Users
The first project โ a fintech company, expense categorization app for millions of existing users. Code shipped, deployed to production, 1M+ users started using it. Everything was fine until an incident happened.
Opened the logs โ inconsistent format, incomplete data, redundant entries with nothing useful.
"Log Later" - Technical Debt
Back then, logging was considered something that could be added later. Ship features first.
But when production had issues, debugging was in the dark:
- Debugging in the dark โ logs only captured generic INFO level, no user_id, no request context, no flow tracing
- No reporting possible โ couldn't tell who was affected, how many users were impacted
S3 + Athena: Right Strategy, Wrong Usage
S3 + Athena is a solid logging strategy โ many large companies still use it for long-term storage and analytics. The problem wasn't the tool, it was how it was used.
Back then logs were written as plain text, each service logging in a different format. When exported to S3, Athena couldn't parse them because the schema was inconsistent. Every query required format transformation โ taking weeks to produce a single report.
If logs had been structured from the start, S3 + Athena would have worked smoothly. Clean schema, SQL queries run directly, no transformation needed.
What Good Logging Looks Like
After that lesson, good logging has three clear elements:
Structured Format
Logs should be JSON โ CloudWatch Insights and Athena both handle this format well โ not plain text. Each entry is an object with a clear schema:
{
"level": "info",
"message": "Payment processed",
"userId": "u_123",
"requestId": "req_abc",
"action": "payment.create",
"amount": 250000,
"timestamp": "2026-05-14T07:00:00.000Z"
}
Every language has good logging libraries. NodeJS has Winston and Pino, Go has zerolog, ...
Correlation ID
Every request needs a unique ID. This ID flows through the entire pipeline โ gateway โ service A โ service B โ database โ response. Query by requestId and you can trace the full journey.
Right Log Level
| Level | When to use |
|---|---|
ERROR |
Incidents requiring immediate attention |
WARN |
Anomalies but system still running |
INFO |
Normal flow |
DEBUG |
Technical details, enable only when debugging |
Common mistake: logging everything as ERROR. When everything is an error, nothing is.
Minimum Logs You Need
Any backend system must have these 4 log types from day one:
- Access log โ every incoming request: method, path, status code, response time, user_id. Foundation for traffic analysis
- Application log โ business events: user created, payment processed. Enough context for auditing
- Error log โ full stack trace + request context + user context. Capture system state at the time of error
- Infrastructure log โ slow queries, cache miss rate, memory spikes. Early warning before incidents happen
Invest Early, Save Later
Logging is insurance โ cheap when done early, expensive when deferred. The first project taught that the hard way.
The simplest lesson: before shipping your first feature to production, set up structured logging with correlation IDs. Log in the right format from the start, and S3 + Athena will work as intended. Log in the wrong format, and every tool becomes a pain point.