Troubleshooting Guide

Dieser Inhalt ist noch nicht in deiner Sprache verfügbar.

Comprehensive guide to diagnose and resolve common issues with your SaaS infrastructure deployment.

🔍 Quick Diagnostics

System Health Check

# Check overall cluster status
kubectl get nodes
kubectl get pods --all-namespaces

# Check application status
kubectl get pods -n production
kubectl get services -n production
kubectl get ingress -n production

# Check resource usage
kubectl top nodes
kubectl top pods -n production

Common Status Indicators

Status	Meaning	Action
`Running`	✅ Healthy	No action needed
`Pending`	⏳ Waiting for resources	Check resource availability
`CrashLoopBackOff`	🔄 Repeated failures	Check logs for errors
`ImagePullBackOff`	📦 Image issues	Check image registry
`ErrImagePull`	❌ Image not found	Verify image name/tag

🚨 Critical Issues

Issue: Pods Stuck in Pending State

Symptoms:

Pods remain in Pending status
No error messages in pod description

Diagnosis:

# Check pod details
kubectl describe pod <pod-name> -n production

# Check node resources
kubectl describe nodes
kubectl top nodes

# Check storage classes
kubectl get storageclass
kubectl get pvc -n production

Common Causes & Solutions:

Insufficient CPU/Memory

# Check resource requests vs available
kubectl describe nodes | grep -A 5 "Allocated resources"

# Scale down resource requests in values.yaml
resources:
  requests:
    cpu: 100m      # Reduce from 500m
    memory: 128Mi  # Reduce from 512Mi

Storage Issues

# Check PVC status
kubectl get pvc -n production
kubectl describe pvc <pvc-name> -n production

# Verify storage class exists
kubectl get storageclass

Node Selector Issues

# Check node labels
kubectl get nodes --show-labels

# Verify node selector in deployment
kubectl get deployment <deployment-name> -o yaml | grep -A 5 nodeSelector

Issue: Application Not Accessible

Symptoms:

404 errors when accessing application
Ingress shows no endpoints
Services not responding

Diagnosis:

# Check ingress status
kubectl get ingress -n production
kubectl describe ingress -n production

# Check service endpoints
kubectl get endpoints -n production
kubectl describe service <service-name> -n production

# Check pod readiness
kubectl get pods -n production -o wide

Solutions:

Ingress Controller Issues

# Verify Traefik is running
kubectl get pods -n kube-system | grep traefik

# Check Traefik logs
kubectl logs -f deployment/traefik -n kube-system

# Restart Traefik if needed
kubectl rollout restart deployment/traefik -n kube-system

Service Configuration

# Verify service ports match pod ports
kubectl get service <service-name> -o yaml
kubectl get pods <pod-name> -o yaml | grep -A 5 ports

# Check if pods are ready
kubectl get pods -n production -l app=<app-label>

DNS Resolution

# Test DNS resolution
nslookup yourdomain.com
dig yourdomain.com

# Check if DNS records are properly configured
# A record should point to your cluster's external IP

Issue: SSL Certificate Problems

Symptoms:

Browser shows SSL errors
Certificate not found errors
HTTPS redirects failing

Diagnosis:

# Check certificate status
kubectl get certificates -n production
kubectl describe certificate -n production

# Check cert-manager status
kubectl get pods -n cert-manager
kubectl logs -f deployment/cert-manager -n cert-manager

# Check cluster issuer
kubectl get clusterissuer
kubectl describe clusterissuer letsencrypt-prod

Solutions:

Certificate Not Issued

# Check certificate events
kubectl describe certificate <cert-name> -n production

# Verify domain ownership
kubectl get ingress -n production -o yaml | grep host

# Check Let's Encrypt rate limits
kubectl logs -f deployment/cert-manager -n cert-manager | grep rate

DNS Challenge Issues

# Verify DNS records for ACME challenge
# Should have TXT record: _acme-challenge.yourdomain.com

# Check if DNS propagation is complete
dig TXT _acme-challenge.yourdomain.com

Force Certificate Renewal

# Delete and recreate certificate
kubectl delete certificate <cert-name> -n production
kubectl apply -f certificate.yaml

🗄️ Database Issues

Issue: Database Connection Failed

Symptoms:

Backend pods in CrashLoopBackOff
Database connection timeout errors
Application startup failures

Diagnosis:

# Check database pod status
kubectl get pods -n production | grep mariadb
kubectl logs -f mariadb-0 -n production

# Check database service
kubectl get service mariadb-service -n production
kubectl describe service mariadb-service -n production

# Test database connectivity
kubectl exec -it mariadb-0 -n production -- mysql -u root -p

Solutions:

Database Pod Not Ready

# Check database pod details
kubectl describe pod mariadb-0 -n production

# Check persistent volume
kubectl get pvc -n production | grep mariadb
kubectl describe pvc mariadb-pvc -n production

# Check database logs
kubectl logs -f mariadb-0 -n production

Configuration Issues

# Verify database configuration
kubectl get configmap backend-config -n production -o yaml

# Check if database credentials are correct
kubectl get secret backend-secrets -n production -o yaml

Database Initialization

# Check if database is initialized
kubectl exec -it mariadb-0 -n production -- mysql -u root -p -e "SHOW DATABASES;"

# Run database migrations
kubectl exec -it deployment/backend -n production -- php artisan migrate

Issue: Database Performance Problems

Symptoms:

Slow query responses
High CPU usage on database pod
Connection timeouts

Diagnosis:

# Check database resource usage
kubectl top pods -n production | grep mariadb

# Check database metrics
kubectl exec -it mariadb-0 -n production -- mysql -u root -p -e "SHOW PROCESSLIST;"
kubectl exec -it mariadb-0 -n production -- mysql -u root -p -e "SHOW STATUS LIKE 'Threads_connected';"

Solutions:

Resource Constraints

# Increase database resources
# Edit values.yaml in mariadb module
resources:
  requests:
    cpu: 500m
    memory: 1Gi
  limits:
    cpu: 1000m
    memory: 2Gi

Connection Pool Issues

# Check connection pool settings in backend config
kubectl get configmap backend-config -n production -o yaml | grep -A 5 DB_

# Optimize database configuration
# Add to backend configuration:
DB_POOL_SIZE: "10"
DB_TIMEOUT: "60"

🔄 Queue and Cache Issues

Issue: Queue Jobs Not Processing

Symptoms:

Jobs stuck in queue
Failed job notifications
Queue worker pods not running

Diagnosis:

# Check queue worker status
kubectl get pods -n production | grep worker
kubectl logs -f deployment/worker -n production

# Check Redis connection
kubectl get pods -n production | grep redis
kubectl exec -it redis-0 -n production -- redis-cli ping

Solutions:

Queue Worker Not Running

# Check worker deployment
kubectl get deployment worker -n production
kubectl describe deployment worker -n production

# Restart worker deployment
kubectl rollout restart deployment/worker -n production

Redis Connection Issues

# Check Redis pod status
kubectl get pods -n production | grep redis
kubectl logs -f redis-0 -n production

# Test Redis connectivity
kubectl exec -it redis-0 -n production -- redis-cli ping

Queue Configuration

# Verify queue configuration
kubectl get configmap backend-config -n production -o yaml | grep -A 5 QUEUE_

# Check failed jobs
kubectl exec -it deployment/backend -n production -- php artisan queue:failed

Issue: Cache Not Working

Symptoms:

Slow application performance
Cache misses
Redis connection errors

Diagnosis:

# Check Redis status
kubectl get pods -n production | grep redis
kubectl logs -f redis-0 -n production

# Test cache functionality
kubectl exec -it deployment/backend -n production -- php artisan tinker
# In tinker: Cache::put('test', 'value', 60); Cache::get('test');

Solutions:

Redis Pod Issues

# Check Redis pod details
kubectl describe pod redis-0 -n production

# Check Redis logs
kubectl logs -f redis-0 -n production

# Restart Redis if needed
kubectl rollout restart statefulset/redis -n production

Cache Configuration

# Verify cache configuration
kubectl get configmap backend-config -n production -o yaml | grep -A 5 CACHE_

# Check Redis connection settings
kubectl get configmap backend-config -n production -o yaml | grep -A 5 REDIS_

📧 Email and External Services

Issue: Email Not Sending

Symptoms:

Email delivery failures
SMTP connection errors
Email queue not processing

Diagnosis:

# Check email configuration
kubectl get configmap backend-config -n production -o yaml | grep -A 10 MAIL_

# Check email logs
kubectl logs -f deployment/backend -n production | grep -i mail

# Test email sending
kubectl exec -it deployment/backend -n production -- php artisan tinker
# In tinker: Mail::raw('Test', function($msg) { $msg->to('[email protected]')->subject('Test'); });

Solutions:

AWS SES Configuration

# Verify SES credentials
kubectl get secret backend-secrets -n production -o yaml | grep -A 2 MAIL_

# Check SES sending limits
# Verify domain verification in AWS SES console

SMTP Settings

# Update email configuration
# Ensure MAIL_HOST, MAIL_PORT, MAIL_USERNAME, MAIL_PASSWORD are correct

# Test SMTP connection
kubectl exec -it deployment/backend -n production -- php artisan tinker
# Test SMTP connection manually

Issue: External API Failures

Symptoms:

Payment processing errors
Third-party service timeouts
API rate limit errors

Diagnosis:

# Check API configuration
kubectl get configmap backend-config -n production -o yaml | grep -A 5 STRIPE_
kubectl get secret backend-secrets -n production -o yaml | grep -A 5 STRIPE_

# Check application logs
kubectl logs -f deployment/backend -n production | grep -i api

Solutions:

API Key Issues

# Verify API keys are correct
kubectl get secret backend-secrets -n production -o yaml

# Check if keys are base64 encoded
echo "your-api-key" | base64

Rate Limiting

# Implement rate limiting in application
# Add delays between API calls
# Use exponential backoff for retries

🔧 Application-Specific Issues

Issue: Laravel Backend Errors

Symptoms:

500 server errors
Application exceptions
Database migration failures

Diagnosis:

# Check Laravel logs
kubectl logs -f deployment/backend -n production

# Check application status
kubectl exec -it deployment/backend -n production -- php artisan about

# Check environment configuration
kubectl exec -it deployment/backend -n production -- php artisan config:show

Solutions:

Configuration Issues

# Clear configuration cache
kubectl exec -it deployment/backend -n production -- php artisan config:clear
kubectl exec -it deployment/backend -n production -- php artisan cache:clear

# Regenerate application key
kubectl exec -it deployment/backend -n production -- php artisan key:generate

Database Migration Issues

# Check migration status
kubectl exec -it deployment/backend -n production -- php artisan migrate:status

# Run pending migrations
kubectl exec -it deployment/backend -n production -- php artisan migrate

# Rollback if needed
kubectl exec -it deployment/backend -n production -- php artisan migrate:rollback

Issue: Next.js Frontend Errors

Symptoms:

Build failures
Runtime errors
Static asset issues

Diagnosis:

# Check frontend logs
kubectl logs -f deployment/frontend -n production

# Check build status
kubectl describe pod -l app=frontend -n production

# Check static assets
kubectl exec -it deployment/frontend -n production -- ls -la /app/public

Solutions:

Build Issues

# Check build logs
kubectl logs -f deployment/frontend -n production | grep -i build

# Verify environment variables
kubectl get configmap frontend-config -n production -o yaml

# Restart deployment
kubectl rollout restart deployment/frontend -n production

Runtime Errors

# Check browser console for errors
# Verify API endpoints are accessible
# Check CORS configuration

🛠️ Maintenance and Recovery

Emergency Recovery Procedures

Complete System Restart

# Restart all deployments
kubectl rollout restart deployment/backend -n production
kubectl rollout restart deployment/frontend -n production
kubectl rollout restart deployment/worker -n production

# Restart infrastructure components
kubectl rollout restart statefulset/mariadb -n production
kubectl rollout restart statefulset/redis -n production

Database Recovery

# Create database backup
kubectl exec -it mariadb-0 -n production -- mysqldump -u root -p myproject > backup.sql

# Restore from backup
kubectl exec -i mariadb-0 -n production -- mysql -u root -p myproject < backup.sql

Configuration Reset

# Reset to default configuration
kubectl delete configmap backend-config -n production
kubectl apply -f config-maps/backend.yaml

# Restart applications
kubectl rollout restart deployment/backend -n production

Monitoring and Alerts

Set Up Monitoring

# Install monitoring stack
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install monitoring prometheus-community/kube-prometheus-stack

Configure Alerts

# Create alert rules for:
# - Pod restarts
# - High resource usage
# - Service unavailability
# - Certificate expiration

📞 Getting Help

Information to Collect

When seeking help, collect the following information:

# System information
kubectl cluster-info
kubectl version

# Application status
kubectl get pods --all-namespaces
kubectl get services --all-namespaces
kubectl get ingress --all-namespaces

# Recent logs
kubectl logs --tail=100 deployment/backend -n production
kubectl logs --tail=100 deployment/frontend -n production

# Events
kubectl get events --all-namespaces --sort-by='.lastTimestamp'

Common Error Messages

Error	Cause	Solution
`ImagePullBackOff`	Image not found	Check image name and registry
`CrashLoopBackOff`	Application errors	Check application logs
`Pending`	Resource constraints	Check node resources
`FailedScheduling`	Node selector issues	Check node labels
`ErrImagePull`	Registry authentication	Check image pull secrets

For additional support, check the GitHub Issues or create a new issue with the collected diagnostic information.