Grafana Stack Integration
Clean Stack integrates with the Grafana observability stack (Grafana, Prometheus, Loki, and Tempo) to provide a complete observability solution.
Stack Overview
Quick Start
- Start the observability stack:
 
bun run platform:observability
- Access Grafana:
 
- URL: http://localhost:3000
 - Default credentials:
- Username: admin
 - Password: admin
 
 
Pre-configured Dashboards
1. Service Overview
- Request rates and latencies
 - Error rates
 - Resource usage
 - Cache performance
 
2. Distributed Tracing
- End-to-end request flows
 - Service dependencies
 - Performance bottlenecks
 - Error analysis
 
3. Log Analytics
- Structured log search
 - Log correlation
 - Pattern analysis
 - Alert configuration
 
Custom Dashboard Creation
1. Metrics Dashboard
// Define custom metrics
const requestDuration = metrics.createHistogram('http_request_duration', {
  description: 'HTTP request duration',
  unit: 'ms',
  boundaries: [10, 50, 100, 200, 500, 1000]
});
// Use in your code
app.use(async (ctx, next) => {
  const startTime = Date.now();
  try {
    await next();
  } finally {
    const duration = Date.now() - startTime;
    requestDuration.record(duration, {
      path: ctx.path,
      method: ctx.method,
      status: ctx.status
    });
  }
});
Then in Grafana:
- Add new panel
 - Query: 
rate(http_request_duration_bucket[5m]) - Visualization: Heatmap
 
2. Trace Analysis
// Add custom attributes to spans
const span = tracer.startSpan('process-order');
span.setAttribute('order.id', orderId);
span.setAttribute('customer.type', customerType);
span.setAttribute('order.value', orderValue);
In Grafana Tempo:
- Search by attribute
 - Create Service Graph
 - Analyze Flame Graph
 
3. Log Queries
// Structured logging
logger.info('Order processed', {
  orderId: 'order-123',
  processingTime: 150,
  customerTier: 'premium'
});
In Grafana Loki:
{service="order-service"} 
  | json
  | processingTime > 100
  | customerTier="premium"
Alert Configuration
1. High Latency Alert
# In Grafana UI:
alert:
  name: High Service Latency
  condition: avg_over_time(http_request_duration_seconds[5m]) > 0.5
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Service experiencing high latency
2. Error Rate Alert
alert:
  name: High Error Rate
  condition: |
    sum(rate(http_requests_total{status=~"5.."}[5m])) 
    / 
    sum(rate(http_requests_total[5m])) > 0.05
  for: 5m
  labels:
    severity: critical
Data Retention
Default retention periods:
- Metrics (Prometheus): 15 days
 - Logs (Loki): 7 days
 - Traces (Tempo): 3 days
 
Configure in docker-compose:
prometheus:
  command:
    - '--storage.tsdb.retention.time=30d'
loki:
  config:
    table_manager:
      retention_period: 168h
tempo:
  retention_period: 72h
Best Practices
- 
Dashboard Organization
- Use folders for different teams/services
 - Standardize naming conventions
 - Include documentation panels
 
 - 
Query Optimization
- Use recording rules for complex queries
 - Limit high-cardinality labels
 - Set appropriate time ranges
 
 - 
Alert Management
- Define clear severity levels
 - Include runbooks in alerts
 - Configure proper notification channels
 
 
Troubleshooting
Common Issues
- 
Missing Data
- Check collector connectivity
 - Verify port configurations
 - Ensure correct label matching
 
 - 
Dashboard Performance
- Optimize time ranges
 - Use appropriate refresh intervals
 - Minimize panel count
 
 - 
Alert Issues
- Validate alert conditions
 - Check notification settings
 - Review alert history
 
 
Security Considerations
- 
Access Control
grafana:
env:
GF_AUTH_DISABLE_LOGIN_FORM: "false"
GF_AUTH_ANONYMOUS_ENABLED: "false" - 
Network Security
- Use TLS for data transmission
 - Implement proper authentication
 - Restrict network access
 
 - 
Data Protection
- Configure data retention
 - Implement log sanitization
 - Manage sensitive labels