- Published on
Log Forwarding with Fluentd: Setting Up a Distributed Logging System on EC2
- Authors
- Name
- Nguyen Phuc Cuong
Effective log management is crucial for monitoring and troubleshooting distributed systems in the cloud. When running multiple EC2 instances, centralized logging becomes essential for maintaining visibility across your infrastructure. In this guide, I'll walk you through setting up Fluentd on two EC2 instances to create a simple yet powerful log forwarding system.
What is Fluentd?
"Fluentd is an open source data collector for unified logging layer. Fluentd allows you to unify data collection and consumption for a better use and understanding of data."
Fluentd acts as a data collector that can read from various sources, process the data, and forward it to different outputs. Its plugin-based architecture makes it extremely versatile and a perfect fit for creating logging pipelines in cloud environments.
Our Setup: Two-Server Log Forwarding Architecture
For this tutorial, we'll set up:
- A source server that generates logs and sends them to the target
- A target server that receives and stores logs from the source
This setup can be extended to multiple source servers all forwarding to a central logging server.
🖥️ Source Server
- Runs your application
- Generates log files
- Runs Fluentd with tail input plugin
- Forwards new log entries in real-time
🗄️ Target Server
- Runs Fluentd as an aggregator
- Receives logs from source server(s)
- Stores logs centrally
- Can forward to additional systems
Prerequisites
Before we begin, make sure you have:
- Two running EC2 instances with Amazon Linux 2 or Ubuntu (I'll provide commands for both)
- SSH access to both instances
- Security groups configured to allow traffic on port 24224 (Fluentd's default port)
- A sample application or log file on the source server
Step 1: Installing Fluentd on Both Instances
Let's start by installing Fluentd on both our source and target EC2 instances. The installation process is similar for both servers.
For Amazon Linux 2
# Install ruby and development tools
sudo amazon-linux-extras install ruby2.6
sudo yum install -y ruby-devel gcc make
# Install Fluentd via Ruby's gem package manager
sudo gem install fluentd --no-document
# Install required plugin for our setup
sudo gem install fluent-plugin-forward --no-document
For Ubuntu
# Update package lists
sudo apt-get update
# Install ruby and development tools
sudo apt-get install -y ruby-full build-essential
# Install Fluentd via Ruby's gem package manager
sudo gem install fluentd --no-document
# Install required plugin for our setup
sudo gem install fluent-plugin-forward --no-document
Note: For production environments, Treasure Data (the company behind Fluentd) recommends using their official packages (td-agent) for stable releases. However, for learning purposes, the gem installation is simpler and works well.
Step 2: Configuring the Target Server
The target server will receive logs from our source server. Let's create a configuration file for it:
# Create a directory for Fluentd configurations
sudo mkdir -p /etc/fluentd
# Create the configuration file
sudo nano /etc/fluentd/target.conf
Add the following configuration:
<source>
@type forward
port 24224
bind 0.0.0.0
</source>
<match **>
@type file
path /var/log/fluentd/received
time_slice_format %Y%m%d
time_slice_wait 10m
time_format %Y%m%dT%H%M%S%z
compress gzip
utc
</match>
This configuration:
- Sets up a forward input source that listens on port 24224
- Creates a match pattern that captures all logs
- Writes the received logs to files in
/var/log/fluentd/received
Now create the output directory and start Fluentd:
# Create the log directory
sudo mkdir -p /var/log/fluentd
# Start Fluentd with our configuration
sudo fluentd -c /etc/fluentd/target.conf -d /var/run/fluentd.pid
Step 3: Configuring the Source Server
Now let's set up the source server to forward logs to our target server:
# Create a directory for Fluentd configurations
sudo mkdir -p /etc/fluentd
# Create the configuration file
sudo nano /etc/fluentd/source.conf
Add the following configuration:
<source>
@type tail
path /path/to/your/application.log
pos_file /var/log/fluentd/application.log.pos
tag app.logs
<parse>
@type none # This captures the whole line as 'message'
</parse>
</source>
<match app.**>
@type forward
<server>
host TARGET_SERVER_IP # Replace with your target server's IP
port 24224
</server>
<buffer>
@type memory
flush_interval 5s
retry_max_times 17
retry_type exponential_backoff
</buffer>
</match>
Replace TARGET_SERVER_IP
with the private IP address of your target EC2 instance. Also, update /path/to/your/application.log
to point to the actual log file you want to monitor.
Let's create the required directories and start Fluentd:
# Create the log and position file directories
sudo mkdir -p /var/log/fluentd
# Start Fluentd with our configuration
sudo fluentd -c /etc/fluentd/source.conf -d /var/run/fluentd.pid
Step 4: Testing the Setup
To test our setup, let's generate some log entries on the source server:
# Generate test log entries
for i in {1..10}; do
echo "Test log entry $i at $(date)" >> /path/to/your/application.log
done
Now check the target server to see if the logs were received:
# List the log files in the target directory
ls -la /var/log/fluentd/received*
# View the contents of the most recent log file
cat /var/log/fluentd/received*
You should see your test log entries in the output!
Making Fluentd Start on Boot
To ensure Fluentd starts automatically when your EC2 instances reboot, let's create systemd service files.
On both servers:
# Create systemd service file
sudo nano /etc/systemd/system/fluentd.service
For the source server, add:
[Unit]
Description=Fluentd Log Forwarder
After=network.target
[Service]
ExecStart=/usr/local/bin/fluentd -c /etc/fluentd/source.conf
Restart=on-failure
User=root
Group=root
[Install]
WantedBy=multi-user.target
For the target server, add:
[Unit]
Description=Fluentd Log Aggregator
After=network.target
[Service]
ExecStart=/usr/local/bin/fluentd -c /etc/fluentd/target.conf
Restart=on-failure
User=root
Group=root
[Install]
WantedBy=multi-user.target
Enable and start the services:
# Enable the service to start on boot
sudo systemctl enable fluentd
# Start the service
sudo systemctl start fluentd
# Check the status
sudo systemctl status fluentd
Advanced Configuration Options
Filtering and Transforming Logs
You can insert filter sections between your source and match sections to transform logs:
<filter app.**>
@type grep
<regexp>
key message
pattern /ERROR|WARN/
</regexp>
</filter>
This example only forwards log lines containing "ERROR" or "WARN".
Adding Metadata
You can add EC2 metadata to your logs:
<filter app.**>
@type record_transformer
<record>
hostname "#{Socket.gethostname}"
instance_id "#{File.read('/var/lib/cloud/data/instance-id').strip rescue 'unknown'}"
environment "production"
</record>
</filter>
Multiple Log Sources
You can monitor multiple log files by adding additional source sections:
<source>
@type tail
path /var/log/nginx/access.log
pos_file /var/log/fluentd/nginx-access.log.pos
tag nginx.access
<parse>
@type nginx
</parse>
</source>
<source>
@type tail
path /var/log/nginx/error.log
pos_file /var/log/fluentd/nginx-error.log.pos
tag nginx.error
<parse>
@type regexp
expression /^(?<time>\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) \[(?<log_level>\w+)\] (?<pid>\d+).(?<tid>\d+): (?<message>.*)$/
time_format %Y/%m/%d %H:%M:%S
</parse>
</source>
Security Considerations
When implementing this in a production environment, consider the following security measures:
Security Concern | Recommended Action |
---|---|
Network Security | Restrict the security group to only allow traffic on port 24224 from your source servers |
Authentication | Add a shared key for authentication between source and target |
Encryption | Set up TLS encryption for the communication between instances |
Log Content | Ensure sensitive information is not being logged |
To implement shared key authentication, modify your configurations:
Target Server:
<source>
@type forward
port 24224
bind 0.0.0.0
<security>
self_hostname target.example.com
shared_key secret_string
</security>
</source>
Source Server:
<match app.**>
@type forward
<server>
host TARGET_SERVER_IP
port 24224
</server>
<security>
self_hostname source.example.com
shared_key secret_string
</security>
</match>
Troubleshooting
If you encounter issues with your Fluentd setup, here are some common problems and solutions:
Logs not forwarding
Check these potential issues:
Ensure Fluentd is running on both servers:
ps aux | grep fluentd
Verify network connectivity:
telnet TARGET_SERVER_IP 24224
Check the Fluentd logs for errors:
sudo fluentd -c /etc/fluentd/source.conf --no-supervisor
Permission issues
If Fluentd can't read your log files:
# Check permissions on the log file
ls -la /path/to/your/application.log
# Give read access if needed
sudo chmod +r /path/to/your/application.log
Conclusion
You've now set up a distributed logging system with Fluentd on EC2 instances! This setup allows you to:
- Monitor log files in real-time using the tail input plugin
- Forward logs to a centralized server
- Store and potentially process the logs further
This simple yet effective pattern can be extended to include multiple source servers, different types of log sources, and various output destinations such as Amazon S3, Elasticsearch, or monitoring systems.
As your infrastructure grows, consider using containerized Fluentd deployments or managed services like AWS CloudWatch Logs for more scalable logging solutions. However, the fundamental concepts you've learned here will remain applicable regardless of the scale.
Happy logging!