# πŸ”₯ TOP PRODUCTION SCENARIO QUESTIONS (Senior Level)


## 🚨 1. Performance Issues

* Your API response time suddenly increased from 200ms to 3 seconds. How will you debug it?
* One microservice is slow, but others are fine. How do you isolate the issue?
* Database CPU is 100%. What steps will you take?
* Application memory usage keeps increasing over time—how will you find root cause?
* High GC (Garbage Collection) in .NET app—how do you handle it?

---

## πŸ”₯ 2. Production Failures / Downtime

* A critical microservice is down in production. What will you do?
* How do you design system to handle partial failures?
* What happens if Kafka goes down in production?
* One dependency service is slow/unavailable—how do you protect your system?
* How do you ensure zero-downtime deployment?

---

## 🧠 3. Microservices & Communication Issues

* One service is calling another service and causing cascading failure—how will you fix it?
* How do you handle synchronous vs asynchronous communication in production?
* How do you prevent tight coupling between microservices?
* What will you do if service-to-service latency increases?
* How do you handle version mismatch between microservices?

---

## πŸ“‘ 4. Kafka / Messaging Issues (VERY IMPORTANT)

* Kafka consumer lag is increasing. How do you troubleshoot?
* Duplicate messages are being processed—how do you handle it?
* Messages are arriving out of order—what will you do?
* One partition is overloaded—how do you fix it?
* What happens if consumer crashes mid-processing?

---

## πŸ—„️ 5. Database Production Issues

* Your SQL query is slow in production but fast in dev—why?
* Deadlocks are happening in production. How do you fix them?
* How do you handle high read/write load in SQL Server/PostgreSQL?
* Indexes are improving reads but slowing writes—what will you do?
* How do you handle database scaling?

---

## ☁️ 6. Cloud / Deployment Issues (Azure / Docker / K8s)

* Kubernetes pod keeps restarting—how do you debug it?
* One service is not getting traffic after deployment—why?
* How do you roll back a failed deployment?
* How do you handle autoscaling in production?
* Docker container works locally but fails in production—why?

---

## πŸ” 7. Security / Production Safety

* How do you secure APIs in production?
* How do you prevent SQL injection and XSS in real systems?
* How do you handle secrets in cloud environments?
* How do you implement rate limiting?
* How do you secure inter-service communication?

---

## πŸ”„ 8. Data Consistency / Distributed Systems

* How do you handle eventual consistency in microservices?
* What happens if one service updates DB but event publishing fails?
* How do you ensure data consistency across services?
* What is your strategy for retry and compensation logic?
* How do you handle duplicate event processing?

---

## πŸ“Š 9. Monitoring / Debugging / Observability

* How do you debug an issue in production across microservices?
* What tools do you use for logging and monitoring?
* How do you trace a request across services?
* How do you detect performance bottlenecks?
* What metrics do you monitor in production?

---

## 🧩 10. Real Incident / War Stories (MOST IMPORTANT)

* Tell me about a major production issue you handled.
* What was the worst production outage you faced?
* Have you ever caused a production issue? What did you learn?
* How do you handle on-call incidents?
* How do you communicate during production failures?

---

# πŸš€ BONUS: VERY COMMON “TRICK SCENARIOS”

These are asked to test thinking:

* Design system that handles **1 million requests/sec**
* What if payment is successful but order service fails?
* How do you handle retry without duplicate processing?
* How do you design system for **fault tolerance + scalability**?
* How do you reduce latency from 2 seconds to 200ms?



This was part of Interview Preparation With Bipin — Let’s Crack It!

Comments

Popular posts from this blog

Angular Architecture

Why should I learn Angular?

Solid Principle