Zum Inhalt springen

Troubleshooting Guide

Dieser Inhalt ist noch nicht in deiner Sprache verfügbar.

Comprehensive guide to diagnose and resolve common issues with your SaaS infrastructure deployment.

Terminal window
# Check overall cluster status
kubectl get nodes
kubectl get pods --all-namespaces
# Check application status
kubectl get pods -n production
kubectl get services -n production
kubectl get ingress -n production
# Check resource usage
kubectl top nodes
kubectl top pods -n production
StatusMeaningAction
Running✅ HealthyNo action needed
Pending⏳ Waiting for resourcesCheck resource availability
CrashLoopBackOff🔄 Repeated failuresCheck logs for errors
ImagePullBackOff📦 Image issuesCheck image registry
ErrImagePull❌ Image not foundVerify image name/tag

Symptoms:

  • Pods remain in Pending status
  • No error messages in pod description

Diagnosis:

Terminal window
# Check pod details
kubectl describe pod <pod-name> -n production
# Check node resources
kubectl describe nodes
kubectl top nodes
# Check storage classes
kubectl get storageclass
kubectl get pvc -n production

Common Causes & Solutions:

  1. Insufficient CPU/Memory

    Terminal window
    # Check resource requests vs available
    kubectl describe nodes | grep -A 5 "Allocated resources"
    # Scale down resource requests in values.yaml
    resources:
    requests:
    cpu: 100m # Reduce from 500m
    memory: 128Mi # Reduce from 512Mi
  2. Storage Issues

    Terminal window
    # Check PVC status
    kubectl get pvc -n production
    kubectl describe pvc <pvc-name> -n production
    # Verify storage class exists
    kubectl get storageclass
  3. Node Selector Issues

    Terminal window
    # Check node labels
    kubectl get nodes --show-labels
    # Verify node selector in deployment
    kubectl get deployment <deployment-name> -o yaml | grep -A 5 nodeSelector

Symptoms:

  • 404 errors when accessing application
  • Ingress shows no endpoints
  • Services not responding

Diagnosis:

Terminal window
# Check ingress status
kubectl get ingress -n production
kubectl describe ingress -n production
# Check service endpoints
kubectl get endpoints -n production
kubectl describe service <service-name> -n production
# Check pod readiness
kubectl get pods -n production -o wide

Solutions:

  1. Ingress Controller Issues

    Terminal window
    # Verify Traefik is running
    kubectl get pods -n kube-system | grep traefik
    # Check Traefik logs
    kubectl logs -f deployment/traefik -n kube-system
    # Restart Traefik if needed
    kubectl rollout restart deployment/traefik -n kube-system
  2. Service Configuration

    Terminal window
    # Verify service ports match pod ports
    kubectl get service <service-name> -o yaml
    kubectl get pods <pod-name> -o yaml | grep -A 5 ports
    # Check if pods are ready
    kubectl get pods -n production -l app=<app-label>
  3. DNS Resolution

    Terminal window
    # Test DNS resolution
    nslookup yourdomain.com
    dig yourdomain.com
    # Check if DNS records are properly configured
    # A record should point to your cluster's external IP

Symptoms:

  • Browser shows SSL errors
  • Certificate not found errors
  • HTTPS redirects failing

Diagnosis:

Terminal window
# Check certificate status
kubectl get certificates -n production
kubectl describe certificate -n production
# Check cert-manager status
kubectl get pods -n cert-manager
kubectl logs -f deployment/cert-manager -n cert-manager
# Check cluster issuer
kubectl get clusterissuer
kubectl describe clusterissuer letsencrypt-prod

Solutions:

  1. Certificate Not Issued

    Terminal window
    # Check certificate events
    kubectl describe certificate <cert-name> -n production
    # Verify domain ownership
    kubectl get ingress -n production -o yaml | grep host
    # Check Let's Encrypt rate limits
    kubectl logs -f deployment/cert-manager -n cert-manager | grep rate
  2. DNS Challenge Issues

    Terminal window
    # Verify DNS records for ACME challenge
    # Should have TXT record: _acme-challenge.yourdomain.com
    # Check if DNS propagation is complete
    dig TXT _acme-challenge.yourdomain.com
  3. Force Certificate Renewal

    Terminal window
    # Delete and recreate certificate
    kubectl delete certificate <cert-name> -n production
    kubectl apply -f certificate.yaml

Symptoms:

  • Backend pods in CrashLoopBackOff
  • Database connection timeout errors
  • Application startup failures

Diagnosis:

Terminal window
# Check database pod status
kubectl get pods -n production | grep mariadb
kubectl logs -f mariadb-0 -n production
# Check database service
kubectl get service mariadb-service -n production
kubectl describe service mariadb-service -n production
# Test database connectivity
kubectl exec -it mariadb-0 -n production -- mysql -u root -p

Solutions:

  1. Database Pod Not Ready

    Terminal window
    # Check database pod details
    kubectl describe pod mariadb-0 -n production
    # Check persistent volume
    kubectl get pvc -n production | grep mariadb
    kubectl describe pvc mariadb-pvc -n production
    # Check database logs
    kubectl logs -f mariadb-0 -n production
  2. Configuration Issues

    Terminal window
    # Verify database configuration
    kubectl get configmap backend-config -n production -o yaml
    # Check if database credentials are correct
    kubectl get secret backend-secrets -n production -o yaml
  3. Database Initialization

    Terminal window
    # Check if database is initialized
    kubectl exec -it mariadb-0 -n production -- mysql -u root -p -e "SHOW DATABASES;"
    # Run database migrations
    kubectl exec -it deployment/backend -n production -- php artisan migrate

Symptoms:

  • Slow query responses
  • High CPU usage on database pod
  • Connection timeouts

Diagnosis:

Terminal window
# Check database resource usage
kubectl top pods -n production | grep mariadb
# Check database metrics
kubectl exec -it mariadb-0 -n production -- mysql -u root -p -e "SHOW PROCESSLIST;"
kubectl exec -it mariadb-0 -n production -- mysql -u root -p -e "SHOW STATUS LIKE 'Threads_connected';"

Solutions:

  1. Resource Constraints

    Terminal window
    # Increase database resources
    # Edit values.yaml in mariadb module
    resources:
    requests:
    cpu: 500m
    memory: 1Gi
    limits:
    cpu: 1000m
    memory: 2Gi
  2. Connection Pool Issues

    Terminal window
    # Check connection pool settings in backend config
    kubectl get configmap backend-config -n production -o yaml | grep -A 5 DB_
    # Optimize database configuration
    # Add to backend configuration:
    DB_POOL_SIZE: "10"
    DB_TIMEOUT: "60"

Symptoms:

  • Jobs stuck in queue
  • Failed job notifications
  • Queue worker pods not running

Diagnosis:

Terminal window
# Check queue worker status
kubectl get pods -n production | grep worker
kubectl logs -f deployment/worker -n production
# Check Redis connection
kubectl get pods -n production | grep redis
kubectl exec -it redis-0 -n production -- redis-cli ping

Solutions:

  1. Queue Worker Not Running

    Terminal window
    # Check worker deployment
    kubectl get deployment worker -n production
    kubectl describe deployment worker -n production
    # Restart worker deployment
    kubectl rollout restart deployment/worker -n production
  2. Redis Connection Issues

    Terminal window
    # Check Redis pod status
    kubectl get pods -n production | grep redis
    kubectl logs -f redis-0 -n production
    # Test Redis connectivity
    kubectl exec -it redis-0 -n production -- redis-cli ping
  3. Queue Configuration

    Terminal window
    # Verify queue configuration
    kubectl get configmap backend-config -n production -o yaml | grep -A 5 QUEUE_
    # Check failed jobs
    kubectl exec -it deployment/backend -n production -- php artisan queue:failed

Symptoms:

  • Slow application performance
  • Cache misses
  • Redis connection errors

Diagnosis:

Terminal window
# Check Redis status
kubectl get pods -n production | grep redis
kubectl logs -f redis-0 -n production
# Test cache functionality
kubectl exec -it deployment/backend -n production -- php artisan tinker
# In tinker: Cache::put('test', 'value', 60); Cache::get('test');

Solutions:

  1. Redis Pod Issues

    Terminal window
    # Check Redis pod details
    kubectl describe pod redis-0 -n production
    # Check Redis logs
    kubectl logs -f redis-0 -n production
    # Restart Redis if needed
    kubectl rollout restart statefulset/redis -n production
  2. Cache Configuration

    Terminal window
    # Verify cache configuration
    kubectl get configmap backend-config -n production -o yaml | grep -A 5 CACHE_
    # Check Redis connection settings
    kubectl get configmap backend-config -n production -o yaml | grep -A 5 REDIS_

Symptoms:

  • Email delivery failures
  • SMTP connection errors
  • Email queue not processing

Diagnosis:

Terminal window
# Check email configuration
kubectl get configmap backend-config -n production -o yaml | grep -A 10 MAIL_
# Check email logs
kubectl logs -f deployment/backend -n production | grep -i mail
# Test email sending
kubectl exec -it deployment/backend -n production -- php artisan tinker
# In tinker: Mail::raw('Test', function($msg) { $msg->to('[email protected]')->subject('Test'); });

Solutions:

  1. AWS SES Configuration

    Terminal window
    # Verify SES credentials
    kubectl get secret backend-secrets -n production -o yaml | grep -A 2 MAIL_
    # Check SES sending limits
    # Verify domain verification in AWS SES console
  2. SMTP Settings

    Terminal window
    # Update email configuration
    # Ensure MAIL_HOST, MAIL_PORT, MAIL_USERNAME, MAIL_PASSWORD are correct
    # Test SMTP connection
    kubectl exec -it deployment/backend -n production -- php artisan tinker
    # Test SMTP connection manually

Symptoms:

  • Payment processing errors
  • Third-party service timeouts
  • API rate limit errors

Diagnosis:

Terminal window
# Check API configuration
kubectl get configmap backend-config -n production -o yaml | grep -A 5 STRIPE_
kubectl get secret backend-secrets -n production -o yaml | grep -A 5 STRIPE_
# Check application logs
kubectl logs -f deployment/backend -n production | grep -i api

Solutions:

  1. API Key Issues

    Terminal window
    # Verify API keys are correct
    kubectl get secret backend-secrets -n production -o yaml
    # Check if keys are base64 encoded
    echo "your-api-key" | base64
  2. Rate Limiting

    Terminal window
    # Implement rate limiting in application
    # Add delays between API calls
    # Use exponential backoff for retries

Symptoms:

  • 500 server errors
  • Application exceptions
  • Database migration failures

Diagnosis:

Terminal window
# Check Laravel logs
kubectl logs -f deployment/backend -n production
# Check application status
kubectl exec -it deployment/backend -n production -- php artisan about
# Check environment configuration
kubectl exec -it deployment/backend -n production -- php artisan config:show

Solutions:

  1. Configuration Issues

    Terminal window
    # Clear configuration cache
    kubectl exec -it deployment/backend -n production -- php artisan config:clear
    kubectl exec -it deployment/backend -n production -- php artisan cache:clear
    # Regenerate application key
    kubectl exec -it deployment/backend -n production -- php artisan key:generate
  2. Database Migration Issues

    Terminal window
    # Check migration status
    kubectl exec -it deployment/backend -n production -- php artisan migrate:status
    # Run pending migrations
    kubectl exec -it deployment/backend -n production -- php artisan migrate
    # Rollback if needed
    kubectl exec -it deployment/backend -n production -- php artisan migrate:rollback

Symptoms:

  • Build failures
  • Runtime errors
  • Static asset issues

Diagnosis:

Terminal window
# Check frontend logs
kubectl logs -f deployment/frontend -n production
# Check build status
kubectl describe pod -l app=frontend -n production
# Check static assets
kubectl exec -it deployment/frontend -n production -- ls -la /app/public

Solutions:

  1. Build Issues

    Terminal window
    # Check build logs
    kubectl logs -f deployment/frontend -n production | grep -i build
    # Verify environment variables
    kubectl get configmap frontend-config -n production -o yaml
    # Restart deployment
    kubectl rollout restart deployment/frontend -n production
  2. Runtime Errors

    Terminal window
    # Check browser console for errors
    # Verify API endpoints are accessible
    # Check CORS configuration
  1. Complete System Restart

    Terminal window
    # Restart all deployments
    kubectl rollout restart deployment/backend -n production
    kubectl rollout restart deployment/frontend -n production
    kubectl rollout restart deployment/worker -n production
    # Restart infrastructure components
    kubectl rollout restart statefulset/mariadb -n production
    kubectl rollout restart statefulset/redis -n production
  2. Database Recovery

    Terminal window
    # Create database backup
    kubectl exec -it mariadb-0 -n production -- mysqldump -u root -p myproject > backup.sql
    # Restore from backup
    kubectl exec -i mariadb-0 -n production -- mysql -u root -p myproject < backup.sql
  3. Configuration Reset

    Terminal window
    # Reset to default configuration
    kubectl delete configmap backend-config -n production
    kubectl apply -f config-maps/backend.yaml
    # Restart applications
    kubectl rollout restart deployment/backend -n production
  1. Set Up Monitoring

    Terminal window
    # Install monitoring stack
    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    helm install monitoring prometheus-community/kube-prometheus-stack
  2. Configure Alerts

    Terminal window
    # Create alert rules for:
    # - Pod restarts
    # - High resource usage
    # - Service unavailability
    # - Certificate expiration

When seeking help, collect the following information:

Terminal window
# System information
kubectl cluster-info
kubectl version
# Application status
kubectl get pods --all-namespaces
kubectl get services --all-namespaces
kubectl get ingress --all-namespaces
# Recent logs
kubectl logs --tail=100 deployment/backend -n production
kubectl logs --tail=100 deployment/frontend -n production
# Events
kubectl get events --all-namespaces --sort-by='.lastTimestamp'
ErrorCauseSolution
ImagePullBackOffImage not foundCheck image name and registry
CrashLoopBackOffApplication errorsCheck application logs
PendingResource constraintsCheck node resources
FailedSchedulingNode selector issuesCheck node labels
ErrImagePullRegistry authenticationCheck image pull secrets

For additional support, check the GitHub Issues or create a new issue with the collected diagnostic information.