Logging Module (Graylog)

Complete guide to deploying and managing Graylog centralized logging with search, analysis, and alerting capabilities.

🏗️ Overview

The Graylog module provides enterprise-grade centralized logging with powerful search, analysis, and alerting capabilities. It enables comprehensive log management, real-time monitoring, and intelligent alerting for distributed applications.

Architecture Components

Graylog Deployment:
├── Graylog Server Pods
├── Elasticsearch Backend
├── MongoDB Metadata Store
├── Web Interface
├── Input Collectors
├── Stream Processing
└── Alerting System

🚀 Features

Centralized Logging

Log Aggregation: Collect logs from multiple sources and applications
Real-time Processing: Process and index logs in real-time
Structured Data: Support for structured and unstructured log data
Multiple Inputs: Syslog, GELF, Beats, and custom inputs

Search & Analysis

Powerful Search: Full-text search with Lucene query syntax
Field Analysis: Extract and analyze specific log fields
Dashboards: Customizable dashboards for log visualization
Saved Searches: Reusable search queries and filters

Alerting & Monitoring

Conditional Alerts: Alert based on log patterns and thresholds
Notification Channels: Email, Slack, webhooks, and custom integrations
Alert History: Track and manage alert occurrences
Escalation Rules: Automated escalation for critical issues

Security & Compliance

Role-based Access: Granular permissions and user management
Audit Logging: Track user actions and system changes
Data Retention: Configurable log retention policies
Encryption: Encrypted data in transit and at rest

📦 Deployment Configuration

Helmfile Configuration

The Graylog deployment uses Helmfile for environment management:

releases:
  - name: graylog
    namespace: logging
    createNamespace: true
    chart: kongz/graylog
    values:
      - values.yaml

Core Configuration Values

# Basic configuration
replicaCount: 1

# MongoDB configuration
mongodb:
  enabled: true
  auth:
    enabled: false

# Elasticsearch configuration
elasticsearch:
  enabled: true
  replicas: 1

# Graylog configuration
graylog:
  image:
    repository: graylog/graylog
    tag: "5.3"

  service:
    type: ClusterIP

  ingress:
    enabled: false

  resources:
    requests:
      memory: "512Mi"
      cpu: "250m"
    limits:
      memory: "1Gi"
      cpu: "500m"

Advanced Configuration Options

# High-availability configuration
replicaCount: 3

# Performance tuning
resources:
  requests:
    memory: "2Gi"
    cpu: "1000m"
  limits:
    memory: "4Gi"
    cpu: "2000m"

# Security configuration
graylog:
  auth:
    enabled: true
    password: "secure-password"

# Monitoring
metrics:
  enabled: true
  serviceMonitor:
    enabled: true

# Web interface
ingress:
  enabled: true
  hostname: logs.theratap.de
  tls: true
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt

🔧 Graylog Configuration

Input Configuration

# Syslog input
inputs:
  - name: "Syslog UDP"
    type: "org.graylog2.inputs.syslog.udp.SyslogUDPInput"
    configuration:
      bind_address: "0.0.0.0"
      port: 514
      recv_buffer_size: 262144

  # GELF input
  - name: "GELF UDP"
    type: "org.graylog2.inputs.gelf.udp.GELFUDPInput"
    configuration:
      bind_address: "0.0.0.0"
      port: 12201
      recv_buffer_size: 262144

  # Beats input
  - name: "Filebeat"
    type: "org.graylog2.inputs.beats.BeatsInput"
    configuration:
      bind_address: "0.0.0.0"
      port: 5044
      tls_enabled: true

Stream Configuration

# Stream definitions
streams:
  - name: "Application Logs"
    description: "All application logs"
    rules:
      - field: "source"
        type: "EXACT"
        value: "application"
        inverted: false

  - name: "Error Logs"
    description: "Error and warning logs"
    rules:
      - field: "level"
        type: "GREATER_OR_EQUAL"
        value: "ERROR"
        inverted: false

Alert Configuration

# Alert definitions
alerts:
  - name: "High Error Rate"
    description: "Alert when error rate exceeds threshold"
    condition:
      type: "field_value"
      field: "level"
      value: "ERROR"
      threshold: 10
      time_window: 300
    notifications:
      - type: "email"
        recipients: ["[email protected]"]
      - type: "slack"
        webhook_url: "https://hooks.slack.com/services/..."

📊 Monitoring & Metrics

Health Checks

# Check Graylog service status
kubectl get pods -n logging -l app.kubernetes.io/name=graylog

# Check service endpoints
kubectl get endpoints -n logging -l app.kubernetes.io/name=graylog

# Test Graylog connectivity
kubectl exec -it graylog-0 -n logging -- curl -u admin:password http://localhost:9000/api/system/overview

Performance Monitoring

# Check resource usage
kubectl top pods -n logging -l app.kubernetes.io/name=graylog

# Monitor log ingestion rate
kubectl exec -it graylog-0 -n logging -- curl -u admin:password http://localhost:9000/api/system/stats

# Check Elasticsearch health
kubectl exec -it graylog-0 -n logging -- curl -u admin:password http://localhost:9000/api/system/indices/index_sets

# Monitor MongoDB status
kubectl exec -it graylog-0 -n logging -- curl -u admin:password http://localhost:9000/api/system/stats

Key Metrics

# Log ingestion rate
kubectl exec -it graylog-0 -n logging -- curl -s -u admin:password http://localhost:9000/api/system/stats | jq '.ingest_rate'

# Total messages
kubectl exec -it graylog-0 -n logging -- curl -s -u admin:password http://localhost:9000/api/system/stats | jq '.total_messages'

# Index size
kubectl exec -it graylog-0 -n logging -- curl -s -u admin:password http://localhost:9000/api/system/stats | jq '.indices_size'

# JVM metrics
kubectl exec -it graylog-0 -n logging -- curl -s -u admin:password http://localhost:9000/api/system/stats | jq '.jvm'

Web Interface

# Port forward web interface
kubectl port-forward -n logging svc/graylog 9000:9000

# Access web interface
open http://localhost:9000

# Default credentials
# Username: admin
# Password: (from graylog.auth.password in values.yaml)

Log Analysis

# View Graylog logs
kubectl logs -n logging -l app.kubernetes.io/name=graylog

# Follow logs in real-time
kubectl logs -f -n logging deployment/graylog

# Check for errors
kubectl logs -n logging -l app.kubernetes.io/name=graylog | grep ERROR

🚀 Deployment

Deploy Graylog Module

# Navigate to module directory
cd iac/modules/graylog

# Deploy using Helmfile
helmfile apply

# Verify deployment
kubectl get pods -n logging
kubectl get services -n logging

Verify Deployment

# Check pod status
kubectl get pods -n logging -l app.kubernetes.io/name=graylog

# Test Graylog API
kubectl exec -it graylog-0 -n logging -- curl -u admin:password http://localhost:9000/api/system/overview

# Check cluster health
kubectl exec -it graylog-0 -n logging -- curl -u admin:password http://localhost:9000/api/system/stats

Post-Deployment Setup

# Port forward for local access
kubectl port-forward -n logging svc/graylog 9000:9000

# Test log ingestion
echo '{"version": "1.1","host":"test","short_message":"Test message","level":1}' | \
  nc -w 1 localhost 12201

# Verify log received
curl -u admin:password "http://localhost:9000/api/search/universal/relative?query=*&range=300"

🔧 Maintenance Operations

Input Management

# List inputs
curl -u admin:password http://localhost:9000/api/system/inputs

# Create syslog input
curl -X POST -u admin:password -H "Content-Type: application/json" \
  -d '{"title":"Syslog UDP","type":"org.graylog2.inputs.syslog.udp.SyslogUDPInput","global":true,"configuration":{"bind_address":"0.0.0.0","port":514}}' \
  http://localhost:9000/api/system/inputs

# Start input
curl -X POST -u admin:password http://localhost:9000/api/system/inputs/INPUT_ID/start

Stream Management

# List streams
curl -u admin:password http://localhost:9000/api/streams

# Create stream
curl -X POST -u admin:password -H "Content-Type: application/json" \
  -d '{"title":"Error Logs","description":"Error and warning logs","rules":[{"type":"EXACT","field":"level","value":"ERROR","inverted":false}]}' \
  http://localhost:9000/api/streams

# Start stream
curl -X POST -u admin:password http://localhost:9000/api/streams/STREAM_ID/start

User Management

# List users
curl -u admin:password http://localhost:9000/api/users

# Create user
curl -X POST -u admin:password -H "Content-Type: application/json" \
  -d '{"username":"appuser","email":"[email protected]","password":"password","full_name":"Application User","roles":["Reader"]}' \
  http://localhost:9000/api/users

# Update user permissions
curl -X PUT -u admin:password -H "Content-Type: application/json" \
  -d '{"roles":["Reader","Streams"]}' \
  http://localhost:9000/api/users/USER_ID

Scaling Operations

# Scale Graylog replicas
kubectl scale deployment graylog -n logging --replicas=3

# Scale Elasticsearch
kubectl scale statefulset elasticsearch -n logging --replicas=3

# Verify scaling
kubectl get pods -n logging

Update Operations

# Update Graylog version
helmfile apply

# Monitor update progress
kubectl rollout status deployment/graylog -n logging

# Rollback if needed
kubectl rollout undo deployment/graylog -n logging

🚨 Troubleshooting

Common Issues

1. Connection Problems

# Check service connectivity
kubectl get services -n logging

# Test network connectivity
kubectl exec -it graylog-0 -n logging -- nc -zv graylog 9000

# Verify DNS resolution
kubectl exec -it graylog-0 -n logging -- nslookup graylog

2. Elasticsearch Issues

# Check Elasticsearch health
kubectl exec -it graylog-0 -n logging -- curl -u admin:password http://localhost:9000/api/system/indices/index_sets

# Check Elasticsearch cluster status
kubectl exec -it elasticsearch-0 -n logging -- curl -s http://localhost:9200/_cluster/health

# Verify index creation
kubectl exec -it graylog-0 -n logging -- curl -u admin:password http://localhost:9000/api/system/indices/index_sets

3. MongoDB Issues

# Check MongoDB connectivity
kubectl exec -it graylog-0 -n logging -- curl -u admin:password http://localhost:9000/api/system/stats

# Check MongoDB logs
kubectl logs -n logging -l app.kubernetes.io/name=mongodb

# Verify MongoDB data
kubectl exec -it mongodb-0 -n logging -- mongo --eval "db.stats()"

4. Log Ingestion Issues

# Check input status
curl -u admin:password http://localhost:9000/api/system/inputs

# Test log ingestion
echo '{"version": "1.1","host":"test","short_message":"Test message"}' | \
  nc -w 1 localhost 12201

# Check for parsing errors
curl -u admin:password "http://localhost:9000/api/search/universal/relative?query=*&range=300"

Recovery Procedures

Emergency Recovery

# Force delete stuck pods
kubectl delete pod graylog-0 -n logging --grace-period=0 --force

# Restart Graylog service
kubectl rollout restart deployment/graylog -n logging

# Verify recovery
kubectl exec -it graylog-0 -n logging -- curl -u admin:password http://localhost:9000/api/system/overview

🔒 Security Configuration

Authentication

# Enable authentication
graylog:
  auth:
    enabled: true
    password: "secure-password"

# Network security
networkPolicy:
  enabled: true
  allowExternal: false
  ingressRules:
    primaryAccessOnlyFrom:
      enabled: true
      namespaceSelector:
        matchLabels:
          name: production
      podSelector:
        matchLabels:
          app.kubernetes.io/name: backend

SSL/TLS Configuration

# TLS configuration
tls:
  enabled: true
  secretName: graylog-tls

# Certificate configuration
certificatesSecret: "graylog-certs"

Access Control

# User roles and permissions
roles:
  - name: "Admin"
    permissions:
      - "users:list"
      - "users:edit"
      - "streams:read"
      - "streams:write"
      - "inputs:read"
      - "inputs:write"

  - name: "Reader"
    permissions:
      - "streams:read"
      - "messages:read"

📝 Configuration Examples

High-Performance Configuration

# High-performance values
resources:
  requests:
    memory: 4Gi
    cpu: 2000m
  limits:
    memory: 8Gi
    cpu: 4000m

# Performance tuning
graylog:
  configuration: |-
    elasticsearch_max_docs_per_index = 20000000
    elasticsearch_max_time_per_index = 1d
    elasticsearch_max_number_of_indices = 20
    elasticsearch_shards = 1
    elasticsearch_replicas = 0

High-Availability Configuration

# High-availability setup
replicaCount: 3

elasticsearch:
  replicas: 3
  resources:
    requests:
      memory: 2Gi
      cpu: 1000m
    limits:
      memory: 4Gi
      cpu: 2000m

mongodb:
  replicas: 3

Monitoring-Optimized Configuration

# Enhanced monitoring
metrics:
  enabled: true
  serviceMonitor:
    enabled: true
    interval: 30s

# Custom dashboards
dashboards:
  - name: "System Overview"
    description: "System health and performance"
    widgets:
      - type: "search_result_chart"
        query: "*"
        timerange: "5m"

🔄 Maintenance Schedule

Daily Tasks

Monitor log ingestion rates
Check alert status
Review error logs
Verify cluster health

Weekly Tasks

Analyze performance metrics
Review log retention
Update user permissions
Check for updates

Monthly Tasks

Capacity planning review
Performance optimization
Security audit
Disaster recovery testing

📋 Operational Checklist

Pre-Deployment

Post-Deployment

Regular Maintenance

Backend Application - Logging integration
Configuration Guide - Log configuration
Security Guide - Log security
Monitoring Guide - Log monitoring

The Graylog logging module provides enterprise-grade centralized logging with powerful search, analysis, and alerting capabilities.