Don't Leave Logging for Later: Lessons From 1M Users

Karify98 & Amy ๐ŸŒธยท
Cover Image for Don't Leave Logging for Later: Lessons From 1M Users

The first project โ€” a fintech company, expense categorization app for millions of existing users. Code shipped, deployed to production, 1M+ users started using it. Everything was fine until an incident happened.

Opened the logs โ€” inconsistent format, incomplete data, redundant entries with nothing useful.

"Log Later" - Technical Debt

Back then, logging was considered something that could be added later. Ship features first.

But when production had issues, debugging was in the dark:

  • Debugging in the dark โ€” logs only captured generic INFO level, no user_id, no request context, no flow tracing
  • No reporting possible โ€” couldn't tell who was affected, how many users were impacted

S3 + Athena: Right Strategy, Wrong Usage

S3 + Athena is a solid logging strategy โ€” many large companies still use it for long-term storage and analytics. The problem wasn't the tool, it was how it was used.

Back then logs were written as plain text, each service logging in a different format. When exported to S3, Athena couldn't parse them because the schema was inconsistent. Every query required format transformation โ€” taking weeks to produce a single report.

If logs had been structured from the start, S3 + Athena would have worked smoothly. Clean schema, SQL queries run directly, no transformation needed.

What Good Logging Looks Like

After that lesson, good logging has three clear elements:

Structured Format

Logs should be JSON โ€” CloudWatch Insights and Athena both handle this format well โ€” not plain text. Each entry is an object with a clear schema:

{
  "level": "info",
  "message": "Payment processed",
  "userId": "u_123",
  "requestId": "req_abc",
  "action": "payment.create",
  "amount": 250000,
  "timestamp": "2026-05-14T07:00:00.000Z"
}

Every language has good logging libraries. NodeJS has Winston and Pino, Go has zerolog, ...

Correlation ID

Every request needs a unique ID. This ID flows through the entire pipeline โ€” gateway โ†’ service A โ†’ service B โ†’ database โ†’ response. Query by requestId and you can trace the full journey.

Right Log Level

Level When to use
ERROR Incidents requiring immediate attention
WARN Anomalies but system still running
INFO Normal flow
DEBUG Technical details, enable only when debugging

Common mistake: logging everything as ERROR. When everything is an error, nothing is.

Minimum Logs You Need

Any backend system must have these 4 log types from day one:

  • Access log โ€” every incoming request: method, path, status code, response time, user_id. Foundation for traffic analysis
  • Application log โ€” business events: user created, payment processed. Enough context for auditing
  • Error log โ€” full stack trace + request context + user context. Capture system state at the time of error
  • Infrastructure log โ€” slow queries, cache miss rate, memory spikes. Early warning before incidents happen

Invest Early, Save Later

Logging is insurance โ€” cheap when done early, expensive when deferred. The first project taught that the hard way.

The simplest lesson: before shipping your first feature to production, set up structured logging with correlation IDs. Log in the right format from the start, and S3 + Athena will work as intended. Log in the wrong format, and every tool becomes a pain point.