Don't Leave Logging for Later: Lessons From 1M Users

The first project — a fintech company, expense categorization app for millions of existing users. Code shipped, deployed to production, 1M+ users started using it. Everything was fine until an incident happened.

Opened the logs — inconsistent format, incomplete data, redundant entries with nothing useful.

"Log Later" - Technical Debt

Back then, logging was considered something that could be added later. Ship features first.

But when production had issues, debugging was in the dark:

Debugging in the dark — logs only captured generic INFO level, no user_id, no request context, no flow tracing
No reporting possible — couldn't tell who was affected, how many users were impacted

S3 + Athena: Right Strategy, Wrong Usage

S3 + Athena is a solid logging strategy — many large companies still use it for long-term storage and analytics. The problem wasn't the tool, it was how it was used.

Back then logs were written as plain text, each service logging in a different format. When exported to S3, Athena couldn't parse them because the schema was inconsistent. Every query required format transformation — taking weeks to produce a single report.

If logs had been structured from the start, S3 + Athena would have worked smoothly. Clean schema, SQL queries run directly, no transformation needed.

What Good Logging Looks Like

After that lesson, good logging has three clear elements:

Structured Format

Logs should be JSON — CloudWatch Insights and Athena both handle this format well — not plain text. Each entry is an object with a clear schema:

{
  "level": "info",
  "message": "Payment processed",
  "userId": "u_123",
  "requestId": "req_abc",
  "action": "payment.create",
  "amount": 250000,
  "timestamp": "2026-05-14T07:00:00.000Z"
}

Every language has good logging libraries. NodeJS has Winston and Pino, Go has zerolog, ...

Correlation ID

Every request needs a unique ID. This ID flows through the entire pipeline — gateway → service A → service B → database → response. Query by requestId and you can trace the full journey.

Right Log Level

Level	When to use
`ERROR`	Incidents requiring immediate attention
`WARN`	Anomalies but system still running
`INFO`	Normal flow
`DEBUG`	Technical details, enable only when debugging

Common mistake: logging everything as ERROR. When everything is an error, nothing is.

Minimum Logs You Need

Any backend system must have these 4 log types from day one:

Access log — every incoming request: method, path, status code, response time, user_id. Foundation for traffic analysis
Application log — business events: user created, payment processed. Enough context for auditing
Error log — full stack trace + request context + user context. Capture system state at the time of error
Infrastructure log — slow queries, cache miss rate, memory spikes. Early warning before incidents happen

Invest Early, Save Later

Logging is insurance — cheap when done early, expensive when deferred. The first project taught that the hard way.

The simplest lesson: before shipping your first feature to production, set up structured logging with correlation IDs. Log in the right format from the start, and S3 + Athena will work as intended. Log in the wrong format, and every tool becomes a pain point.