Troubleshooting Guide
Dieser Inhalt ist noch nicht in deiner Sprache verfügbar.
Comprehensive guide to diagnose and resolve common issues with your SaaS infrastructure deployment.
🔍 Quick Diagnostics
Section titled “🔍 Quick Diagnostics”System Health Check
Section titled “System Health Check”# Check overall cluster statuskubectl get nodeskubectl get pods --all-namespaces
# Check application statuskubectl get pods -n productionkubectl get services -n productionkubectl get ingress -n production
# Check resource usagekubectl top nodeskubectl top pods -n production
Common Status Indicators
Section titled “Common Status Indicators”Status | Meaning | Action |
---|---|---|
Running | ✅ Healthy | No action needed |
Pending | ⏳ Waiting for resources | Check resource availability |
CrashLoopBackOff | 🔄 Repeated failures | Check logs for errors |
ImagePullBackOff | 📦 Image issues | Check image registry |
ErrImagePull | ❌ Image not found | Verify image name/tag |
🚨 Critical Issues
Section titled “🚨 Critical Issues”Issue: Pods Stuck in Pending State
Section titled “Issue: Pods Stuck in Pending State”Symptoms:
- Pods remain in
Pending
status - No error messages in pod description
Diagnosis:
# Check pod detailskubectl describe pod <pod-name> -n production
# Check node resourceskubectl describe nodeskubectl top nodes
# Check storage classeskubectl get storageclasskubectl get pvc -n production
Common Causes & Solutions:
Insufficient CPU/Memory
Terminal window # Check resource requests vs availablekubectl describe nodes | grep -A 5 "Allocated resources"# Scale down resource requests in values.yamlresources:requests:cpu: 100m # Reduce from 500mmemory: 128Mi # Reduce from 512MiStorage Issues
Terminal window # Check PVC statuskubectl get pvc -n productionkubectl describe pvc <pvc-name> -n production# Verify storage class existskubectl get storageclassNode Selector Issues
Terminal window # Check node labelskubectl get nodes --show-labels# Verify node selector in deploymentkubectl get deployment <deployment-name> -o yaml | grep -A 5 nodeSelector
Issue: Application Not Accessible
Section titled “Issue: Application Not Accessible”Symptoms:
- 404 errors when accessing application
- Ingress shows no endpoints
- Services not responding
Diagnosis:
# Check ingress statuskubectl get ingress -n productionkubectl describe ingress -n production
# Check service endpointskubectl get endpoints -n productionkubectl describe service <service-name> -n production
# Check pod readinesskubectl get pods -n production -o wide
Solutions:
Ingress Controller Issues
Terminal window # Verify Traefik is runningkubectl get pods -n kube-system | grep traefik# Check Traefik logskubectl logs -f deployment/traefik -n kube-system# Restart Traefik if neededkubectl rollout restart deployment/traefik -n kube-systemService Configuration
Terminal window # Verify service ports match pod portskubectl get service <service-name> -o yamlkubectl get pods <pod-name> -o yaml | grep -A 5 ports# Check if pods are readykubectl get pods -n production -l app=<app-label>DNS Resolution
Terminal window # Test DNS resolutionnslookup yourdomain.comdig yourdomain.com# Check if DNS records are properly configured# A record should point to your cluster's external IP
Issue: SSL Certificate Problems
Section titled “Issue: SSL Certificate Problems”Symptoms:
- Browser shows SSL errors
- Certificate not found errors
- HTTPS redirects failing
Diagnosis:
# Check certificate statuskubectl get certificates -n productionkubectl describe certificate -n production
# Check cert-manager statuskubectl get pods -n cert-managerkubectl logs -f deployment/cert-manager -n cert-manager
# Check cluster issuerkubectl get clusterissuerkubectl describe clusterissuer letsencrypt-prod
Solutions:
Certificate Not Issued
Terminal window # Check certificate eventskubectl describe certificate <cert-name> -n production# Verify domain ownershipkubectl get ingress -n production -o yaml | grep host# Check Let's Encrypt rate limitskubectl logs -f deployment/cert-manager -n cert-manager | grep rateDNS Challenge Issues
Terminal window # Verify DNS records for ACME challenge# Should have TXT record: _acme-challenge.yourdomain.com# Check if DNS propagation is completedig TXT _acme-challenge.yourdomain.comForce Certificate Renewal
Terminal window # Delete and recreate certificatekubectl delete certificate <cert-name> -n productionkubectl apply -f certificate.yaml
🗄️ Database Issues
Section titled “🗄️ Database Issues”Issue: Database Connection Failed
Section titled “Issue: Database Connection Failed”Symptoms:
- Backend pods in CrashLoopBackOff
- Database connection timeout errors
- Application startup failures
Diagnosis:
# Check database pod statuskubectl get pods -n production | grep mariadbkubectl logs -f mariadb-0 -n production
# Check database servicekubectl get service mariadb-service -n productionkubectl describe service mariadb-service -n production
# Test database connectivitykubectl exec -it mariadb-0 -n production -- mysql -u root -p
Solutions:
Database Pod Not Ready
Terminal window # Check database pod detailskubectl describe pod mariadb-0 -n production# Check persistent volumekubectl get pvc -n production | grep mariadbkubectl describe pvc mariadb-pvc -n production# Check database logskubectl logs -f mariadb-0 -n productionConfiguration Issues
Terminal window # Verify database configurationkubectl get configmap backend-config -n production -o yaml# Check if database credentials are correctkubectl get secret backend-secrets -n production -o yamlDatabase Initialization
Terminal window # Check if database is initializedkubectl exec -it mariadb-0 -n production -- mysql -u root -p -e "SHOW DATABASES;"# Run database migrationskubectl exec -it deployment/backend -n production -- php artisan migrate
Issue: Database Performance Problems
Section titled “Issue: Database Performance Problems”Symptoms:
- Slow query responses
- High CPU usage on database pod
- Connection timeouts
Diagnosis:
# Check database resource usagekubectl top pods -n production | grep mariadb
# Check database metricskubectl exec -it mariadb-0 -n production -- mysql -u root -p -e "SHOW PROCESSLIST;"kubectl exec -it mariadb-0 -n production -- mysql -u root -p -e "SHOW STATUS LIKE 'Threads_connected';"
Solutions:
Resource Constraints
Terminal window # Increase database resources# Edit values.yaml in mariadb moduleresources:requests:cpu: 500mmemory: 1Gilimits:cpu: 1000mmemory: 2GiConnection Pool Issues
Terminal window # Check connection pool settings in backend configkubectl get configmap backend-config -n production -o yaml | grep -A 5 DB_# Optimize database configuration# Add to backend configuration:DB_POOL_SIZE: "10"DB_TIMEOUT: "60"
🔄 Queue and Cache Issues
Section titled “🔄 Queue and Cache Issues”Issue: Queue Jobs Not Processing
Section titled “Issue: Queue Jobs Not Processing”Symptoms:
- Jobs stuck in queue
- Failed job notifications
- Queue worker pods not running
Diagnosis:
# Check queue worker statuskubectl get pods -n production | grep workerkubectl logs -f deployment/worker -n production
# Check Redis connectionkubectl get pods -n production | grep rediskubectl exec -it redis-0 -n production -- redis-cli ping
Solutions:
Queue Worker Not Running
Terminal window # Check worker deploymentkubectl get deployment worker -n productionkubectl describe deployment worker -n production# Restart worker deploymentkubectl rollout restart deployment/worker -n productionRedis Connection Issues
Terminal window # Check Redis pod statuskubectl get pods -n production | grep rediskubectl logs -f redis-0 -n production# Test Redis connectivitykubectl exec -it redis-0 -n production -- redis-cli pingQueue Configuration
Terminal window # Verify queue configurationkubectl get configmap backend-config -n production -o yaml | grep -A 5 QUEUE_# Check failed jobskubectl exec -it deployment/backend -n production -- php artisan queue:failed
Issue: Cache Not Working
Section titled “Issue: Cache Not Working”Symptoms:
- Slow application performance
- Cache misses
- Redis connection errors
Diagnosis:
# Check Redis statuskubectl get pods -n production | grep rediskubectl logs -f redis-0 -n production
# Test cache functionalitykubectl exec -it deployment/backend -n production -- php artisan tinker# In tinker: Cache::put('test', 'value', 60); Cache::get('test');
Solutions:
Redis Pod Issues
Terminal window # Check Redis pod detailskubectl describe pod redis-0 -n production# Check Redis logskubectl logs -f redis-0 -n production# Restart Redis if neededkubectl rollout restart statefulset/redis -n productionCache Configuration
Terminal window # Verify cache configurationkubectl get configmap backend-config -n production -o yaml | grep -A 5 CACHE_# Check Redis connection settingskubectl get configmap backend-config -n production -o yaml | grep -A 5 REDIS_
📧 Email and External Services
Section titled “📧 Email and External Services”Issue: Email Not Sending
Section titled “Issue: Email Not Sending”Symptoms:
- Email delivery failures
- SMTP connection errors
- Email queue not processing
Diagnosis:
# Check email configurationkubectl get configmap backend-config -n production -o yaml | grep -A 10 MAIL_
# Check email logskubectl logs -f deployment/backend -n production | grep -i mail
# Test email sendingkubectl exec -it deployment/backend -n production -- php artisan tinker# In tinker: Mail::raw('Test', function($msg) { $msg->to('[email protected]')->subject('Test'); });
Solutions:
AWS SES Configuration
Terminal window # Verify SES credentialskubectl get secret backend-secrets -n production -o yaml | grep -A 2 MAIL_# Check SES sending limits# Verify domain verification in AWS SES consoleSMTP Settings
Terminal window # Update email configuration# Ensure MAIL_HOST, MAIL_PORT, MAIL_USERNAME, MAIL_PASSWORD are correct# Test SMTP connectionkubectl exec -it deployment/backend -n production -- php artisan tinker# Test SMTP connection manually
Issue: External API Failures
Section titled “Issue: External API Failures”Symptoms:
- Payment processing errors
- Third-party service timeouts
- API rate limit errors
Diagnosis:
# Check API configurationkubectl get configmap backend-config -n production -o yaml | grep -A 5 STRIPE_kubectl get secret backend-secrets -n production -o yaml | grep -A 5 STRIPE_
# Check application logskubectl logs -f deployment/backend -n production | grep -i api
Solutions:
API Key Issues
Terminal window # Verify API keys are correctkubectl get secret backend-secrets -n production -o yaml# Check if keys are base64 encodedecho "your-api-key" | base64Rate Limiting
Terminal window # Implement rate limiting in application# Add delays between API calls# Use exponential backoff for retries
🔧 Application-Specific Issues
Section titled “🔧 Application-Specific Issues”Issue: Laravel Backend Errors
Section titled “Issue: Laravel Backend Errors”Symptoms:
- 500 server errors
- Application exceptions
- Database migration failures
Diagnosis:
# Check Laravel logskubectl logs -f deployment/backend -n production
# Check application statuskubectl exec -it deployment/backend -n production -- php artisan about
# Check environment configurationkubectl exec -it deployment/backend -n production -- php artisan config:show
Solutions:
Configuration Issues
Terminal window # Clear configuration cachekubectl exec -it deployment/backend -n production -- php artisan config:clearkubectl exec -it deployment/backend -n production -- php artisan cache:clear# Regenerate application keykubectl exec -it deployment/backend -n production -- php artisan key:generateDatabase Migration Issues
Terminal window # Check migration statuskubectl exec -it deployment/backend -n production -- php artisan migrate:status# Run pending migrationskubectl exec -it deployment/backend -n production -- php artisan migrate# Rollback if neededkubectl exec -it deployment/backend -n production -- php artisan migrate:rollback
Issue: Next.js Frontend Errors
Section titled “Issue: Next.js Frontend Errors”Symptoms:
- Build failures
- Runtime errors
- Static asset issues
Diagnosis:
# Check frontend logskubectl logs -f deployment/frontend -n production
# Check build statuskubectl describe pod -l app=frontend -n production
# Check static assetskubectl exec -it deployment/frontend -n production -- ls -la /app/public
Solutions:
Build Issues
Terminal window # Check build logskubectl logs -f deployment/frontend -n production | grep -i build# Verify environment variableskubectl get configmap frontend-config -n production -o yaml# Restart deploymentkubectl rollout restart deployment/frontend -n productionRuntime Errors
Terminal window # Check browser console for errors# Verify API endpoints are accessible# Check CORS configuration
🛠️ Maintenance and Recovery
Section titled “🛠️ Maintenance and Recovery”Emergency Recovery Procedures
Section titled “Emergency Recovery Procedures”Complete System Restart
Terminal window # Restart all deploymentskubectl rollout restart deployment/backend -n productionkubectl rollout restart deployment/frontend -n productionkubectl rollout restart deployment/worker -n production# Restart infrastructure componentskubectl rollout restart statefulset/mariadb -n productionkubectl rollout restart statefulset/redis -n productionDatabase Recovery
Terminal window # Create database backupkubectl exec -it mariadb-0 -n production -- mysqldump -u root -p myproject > backup.sql# Restore from backupkubectl exec -i mariadb-0 -n production -- mysql -u root -p myproject < backup.sqlConfiguration Reset
Terminal window # Reset to default configurationkubectl delete configmap backend-config -n productionkubectl apply -f config-maps/backend.yaml# Restart applicationskubectl rollout restart deployment/backend -n production
Monitoring and Alerts
Section titled “Monitoring and Alerts”Set Up Monitoring
Terminal window # Install monitoring stackhelm repo add prometheus-community https://prometheus-community.github.io/helm-chartshelm install monitoring prometheus-community/kube-prometheus-stackConfigure Alerts
Terminal window # Create alert rules for:# - Pod restarts# - High resource usage# - Service unavailability# - Certificate expiration
📞 Getting Help
Section titled “📞 Getting Help”Information to Collect
Section titled “Information to Collect”When seeking help, collect the following information:
# System informationkubectl cluster-infokubectl version
# Application statuskubectl get pods --all-namespaceskubectl get services --all-namespaceskubectl get ingress --all-namespaces
# Recent logskubectl logs --tail=100 deployment/backend -n productionkubectl logs --tail=100 deployment/frontend -n production
# Eventskubectl get events --all-namespaces --sort-by='.lastTimestamp'
Common Error Messages
Section titled “Common Error Messages”Error | Cause | Solution |
---|---|---|
ImagePullBackOff | Image not found | Check image name and registry |
CrashLoopBackOff | Application errors | Check application logs |
Pending | Resource constraints | Check node resources |
FailedScheduling | Node selector issues | Check node labels |
ErrImagePull | Registry authentication | Check image pull secrets |
For additional support, check the GitHub Issues or create a new issue with the collected diagnostic information.