Log Forwarding with Fluentd: Setting Up a Distributed Logging System on EC2

Effective log management is crucial for monitoring and troubleshooting distributed systems in the cloud. When running multiple EC2 instances, centralized logging becomes essential for maintaining visibility across your infrastructure. In this guide, I'll walk you through setting up Fluentd on two EC2 instances to create a simple yet powerful log forwarding system.

What is Fluentd?

"Fluentd is an open source data collector for unified logging layer. Fluentd allows you to unify data collection and consumption for a better use and understanding of data."

Fluentd acts as a data collector that can read from various sources, process the data, and forward it to different outputs. Its plugin-based architecture makes it extremely versatile and a perfect fit for creating logging pipelines in cloud environments.

Our Setup: Two-Server Log Forwarding Architecture

For this tutorial, we'll set up:

A source server that generates logs and sends them to the target
A target server that receives and stores logs from the source

This setup can be extended to multiple source servers all forwarding to a central logging server.

🖥️ Source Server

Runs your application
Generates log files
Runs Fluentd with tail input plugin
Forwards new log entries in real-time

🗄️ Target Server

Runs Fluentd as an aggregator
Receives logs from source server(s)
Stores logs centrally
Can forward to additional systems

Prerequisites

Before we begin, make sure you have:

Two running EC2 instances with Amazon Linux 2 or Ubuntu (I'll provide commands for both)
SSH access to both instances
Security groups configured to allow traffic on port 24224 (Fluentd's default port)
A sample application or log file on the source server

Step 1: Installing Fluentd on Both Instances

Let's start by installing Fluentd on both our source and target EC2 instances. The installation process is similar for both servers.

For Amazon Linux 2

# Install ruby and development tools
sudo amazon-linux-extras install ruby2.6
sudo yum install -y ruby-devel gcc make

# Install Fluentd via Ruby's gem package manager
sudo gem install fluentd --no-document

# Install required plugin for our setup
sudo gem install fluent-plugin-forward --no-document

For Ubuntu

# Update package lists
sudo apt-get update

# Install ruby and development tools
sudo apt-get install -y ruby-full build-essential

# Install Fluentd via Ruby's gem package manager
sudo gem install fluentd --no-document

# Install required plugin for our setup
sudo gem install fluent-plugin-forward --no-document

Note: For production environments, Treasure Data (the company behind Fluentd) recommends using their official packages (td-agent) for stable releases. However, for learning purposes, the gem installation is simpler and works well.

Step 2: Configuring the Target Server

The target server will receive logs from our source server. Let's create a configuration file for it:

# Create a directory for Fluentd configurations
sudo mkdir -p /etc/fluentd

# Create the configuration file
sudo nano /etc/fluentd/target.conf

Add the following configuration:

<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>

<match **>
  @type file
  path /var/log/fluentd/received
  time_slice_format %Y%m%d
  time_slice_wait 10m
  time_format %Y%m%dT%H%M%S%z
  compress gzip
  utc
</match>

This configuration:

Sets up a forward input source that listens on port 24224
Creates a match pattern that captures all logs
Writes the received logs to files in /var/log/fluentd/received

Now create the output directory and start Fluentd:

# Create the log directory
sudo mkdir -p /var/log/fluentd

# Start Fluentd with our configuration
sudo fluentd -c /etc/fluentd/target.conf -d /var/run/fluentd.pid

Step 3: Configuring the Source Server

Now let's set up the source server to forward logs to our target server:

# Create a directory for Fluentd configurations
sudo mkdir -p /etc/fluentd

# Create the configuration file
sudo nano /etc/fluentd/source.conf

Add the following configuration:

<source>
  @type tail
  path /path/to/your/application.log
  pos_file /var/log/fluentd/application.log.pos
  tag app.logs
  <parse>
    @type none  # This captures the whole line as 'message'
  </parse>
</source>

<match app.**>
  @type forward
  <server>
    host TARGET_SERVER_IP  # Replace with your target server's IP
    port 24224
  </server>
  
  <buffer>
    @type memory
    flush_interval 5s
    retry_max_times 17
    retry_type exponential_backoff
  </buffer>
</match>

Replace TARGET_SERVER_IP with the private IP address of your target EC2 instance. Also, update /path/to/your/application.log to point to the actual log file you want to monitor.

Let's create the required directories and start Fluentd:

# Create the log and position file directories
sudo mkdir -p /var/log/fluentd

# Start Fluentd with our configuration
sudo fluentd -c /etc/fluentd/source.conf -d /var/run/fluentd.pid

Step 4: Testing the Setup

To test our setup, let's generate some log entries on the source server:

# Generate test log entries
for i in {1..10}; do 
  echo "Test log entry $i at $(date)" >> /path/to/your/application.log
done

Now check the target server to see if the logs were received:

# List the log files in the target directory
ls -la /var/log/fluentd/received*

# View the contents of the most recent log file
cat /var/log/fluentd/received*

You should see your test log entries in the output!

Making Fluentd Start on Boot

To ensure Fluentd starts automatically when your EC2 instances reboot, let's create systemd service files.

On both servers:

# Create systemd service file
sudo nano /etc/systemd/system/fluentd.service

For the source server, add:

[Unit]
Description=Fluentd Log Forwarder
After=network.target

[Service]
ExecStart=/usr/local/bin/fluentd -c /etc/fluentd/source.conf
Restart=on-failure
User=root
Group=root

[Install]
WantedBy=multi-user.target

For the target server, add:

[Unit]
Description=Fluentd Log Aggregator
After=network.target

[Service]
ExecStart=/usr/local/bin/fluentd -c /etc/fluentd/target.conf
Restart=on-failure
User=root
Group=root

[Install]
WantedBy=multi-user.target

Enable and start the services:

# Enable the service to start on boot
sudo systemctl enable fluentd

# Start the service
sudo systemctl start fluentd

# Check the status
sudo systemctl status fluentd

Advanced Configuration Options

Filtering and Transforming Logs

You can insert filter sections between your source and match sections to transform logs:

<filter app.**>
  @type grep
  <regexp>
    key message
    pattern /ERROR|WARN/
  </regexp>
</filter>

This example only forwards log lines containing "ERROR" or "WARN".

Adding Metadata

You can add EC2 metadata to your logs:

<filter app.**>
  @type record_transformer
  <record>
    hostname "#{Socket.gethostname}"
    instance_id "#{File.read('/var/lib/cloud/data/instance-id').strip rescue 'unknown'}"
    environment "production"
  </record>
</filter>

Multiple Log Sources

You can monitor multiple log files by adding additional source sections:

<source>
  @type tail
  path /var/log/nginx/access.log
  pos_file /var/log/fluentd/nginx-access.log.pos
  tag nginx.access
  <parse>
    @type nginx
  </parse>
</source>

<source>
  @type tail
  path /var/log/nginx/error.log
  pos_file /var/log/fluentd/nginx-error.log.pos
  tag nginx.error
  <parse>
    @type regexp
    expression /^(?<time>\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) \[(?<log_level>\w+)\] (?<pid>\d+).(?<tid>\d+): (?<message>.*)$/
    time_format %Y/%m/%d %H:%M:%S
  </parse>
</source>

Security Considerations

When implementing this in a production environment, consider the following security measures:

Security Concern	Recommended Action
Network Security	Restrict the security group to only allow traffic on port 24224 from your source servers
Authentication	Add a shared key for authentication between source and target
Encryption	Set up TLS encryption for the communication between instances
Log Content	Ensure sensitive information is not being logged

To implement shared key authentication, modify your configurations:

Target Server:

<source>
  @type forward
  port 24224
  bind 0.0.0.0
  <security>
    self_hostname target.example.com
    shared_key secret_string
  </security>
</source>

Source Server:

<match app.**>
  @type forward
  <server>
    host TARGET_SERVER_IP
    port 24224
  </server>
  <security>
    self_hostname source.example.com
    shared_key secret_string
  </security>
</match>

Troubleshooting

If you encounter issues with your Fluentd setup, here are some common problems and solutions:

Logs not forwarding

Check these potential issues:

Ensure Fluentd is running on both servers:
```
ps aux | grep fluentd
```
Verify network connectivity:
```
telnet TARGET_SERVER_IP 24224
```

Check the Fluentd logs for errors:

sudo fluentd -c /etc/fluentd/source.conf --no-supervisor

Permission issues

If Fluentd can't read your log files:

# Check permissions on the log file
ls -la /path/to/your/application.log

# Give read access if needed
sudo chmod +r /path/to/your/application.log

Conclusion

You've now set up a distributed logging system with Fluentd on EC2 instances! This setup allows you to:

Monitor log files in real-time using the tail input plugin
Forward logs to a centralized server
Store and potentially process the logs further

This simple yet effective pattern can be extended to include multiple source servers, different types of log sources, and various output destinations such as Amazon S3, Elasticsearch, or monitoring systems.

As your infrastructure grows, consider using containerized Fluentd deployments or managed services like AWS CloudWatch Logs for more scalable logging solutions. However, the fundamental concepts you've learned here will remain applicable regardless of the scale.

Happy logging!

Last updated: Wednesday, May 7, 2025