Docker Log Management Using Fluentd

Docker is an open-source project to easily create lighweight, portable and self-sufficient containers for applications. Docker allows you to run many isolated applications on a single host without the weight of running virtual machines.

One of the problems with the current versions of docker is managing logs. Each container runs a single process and the output of that process is saved by docker to a location on the host.

There are a few operational issues with this currently:

  • This log file grows indefinitely. Docker logs each line as a JSON message which can cause this file to grow quickly and exceed the disk space on the host since it’s not rotated automatically.
  • The docker logs command returns all recorded logs each time it’s run. Any long running process that is a little verbose can be difficult to examine.
  • Logs under the containers /var/log or other locations are not easily visible or accessible.

Docker Logging Options

While logging in docker is evolving, there are several approaches to handling logs with docker currently:

  • Collection Inside Container - Each container starts up a log collection process in addition to the application that will be running. baseimage-docker uses runit along with syslog as an example.
  • Collection Outside Container - A single collection agent runs on the host and containers have a volume mounted from the host where they write their logs.
  • Collection In Separate Container - This is a slight variation of running the collection agent on the host. The collection agent is also run in a container and volumes from that container are bound to any application containers using the volumes-from docker run option. This Docker and logstash article has an example of this approach.

These approaches work but also have some drawbacks. If collection is performed inside the container, then each container is running duplicate processes that can waste resources. (Running multiple processes in a container seems to be a debated subject even though the docker docs use supervisor as an example.)

If collectionis run outside the container using volumes, you still need to make sure your application logs to those volumes and not stdout/stderr. This might not be possible with all applications. Finally, the containers running still have the container JSON log file that will grow unbounded too.

Using Fluentd With Docker

Another variation of collection outside the container can be done with a centralized logging agent and without binding volumes to the containers. This method works directly against the container’s JSON log file on the host.

When you run a container, the state of the container lives under /var/lib/docker/containers/<id>.

root@precise64:/var/lib/docker/containers/fe38c4124f36d0a5b2a38ea7dd58fe88ac92980286f1f6a7b7ed3ced7c994374# ls -la
total 44
drwx------  3 root root  4096 Mar 14 19:56 .
drwx------ 83 root root 12288 Mar 14 21:53 ..
-rw-r--r--  1 root root   106 Mar 14 19:56 config.env
-rw-r--r--  1 root root  1522 Mar 14 19:56 config.json
-rw-------  1 root root   241 Mar 14 19:56 fe38c4124f36d0a5b2a38ea7dd58fe88ac92980286f1f6a7b7ed3ced7c994374-json.log
-rw-r--r--  1 root root   126 Mar 14 19:56 hostconfig.json
-rw-r--r--  1 root root    13 Mar 14 19:56 hostname
-rw-r--r--  1 root root   181 Mar 14 19:56 hosts
drwxr-xr-x  2 root root  4096 Mar 14 19:56 root

The file fe38c4124f36d0a5b2a38ea7dd58fe88ac92980286f1f6a7b7ed3ced7c994374-json.log is the container log file. Each line is a JSON object and there is a line for every line of input and output from the container.

{"log":"root@c835298de6dd:/# ls\r\n","stream":"stdout","time":"2014-03-14T22:15:15.155863426Z"}
{"log":"bin  boot  dev\u0009etc  home  lib\u0009lib64  media  mnt  opt\u0009proc  root  run  sbin  selinux\u0009srv  sys  tmp  usr  var\r\n","stream":"stdout","time":"2014-03-14T22:15:15.194869963Z"}

fluentd is an open-source data collector that works natively with lines of JSON so you can run a single fluentd instance on the host and configure it to tail each container’s JSON file.

If you need to tail a log file somewhere on the containers file system, you can use the root subdirectory as well. All of the tailed files can then be forwarded to a centralized logging system

This is a sample fluentd.conf file that tails each container’s logs and sends them to stdout.

## File input
## read docker logs with tag=docker.container

<source>
  type tail
  format json
  time_key time
  path /var/lib/docker/containers/c835298de6dde500c78a2444036101bf368908b428ae099ede17cf4855247898/c835298de6dde500c78a2444036101bf368908b428ae099ede17cf4855247898-json.log
  pos_file /var/lib/docker/containers/c835298de6dde500c78a2444036101bf368908b428ae099ede17cf4855247898/c835298de6dde500c78a2444036101bf368908b428ae099ede17cf4855247898-json.log.pos
  tag docker.container.c835298de6dd
  rotate_wait 5
</source>

<source>
  type tail
  format json
  time_key time
  path /var/lib/docker/containers/965c22a2ad1e935cb1476772ebe1ebef0050559b4cbcc7775b936348e7822347/965c22a2ad1e935cb1476772ebe1ebef0050559b4cbcc7775b936348e7822347-json.log
  pos_file /var/lib/docker/containers/965c22a2ad1e935cb1476772ebe1ebef0050559b4cbcc7775b936348e7822347/965c22a2ad1e935cb1476772ebe1ebef0050559b4cbcc7775b936348e7822347-json.log.pos
  tag docker.container.965c22a2ad1e
  rotate_wait 5
</source>

<source>
  type tail
  format json
  time_key time
  path /var/lib/docker/containers/889fe291f590c2c2aa2852856687efbb3e6fdd2faeca84da4bc0be2263f37953/889fe291f590c2c2aa2852856687efbb3e6fdd2faeca84da4bc0be2263f37953-json.log
  pos_file /var/lib/docker/containers/889fe291f590c2c2aa2852856687efbb3e6fdd2faeca84da4bc0be2263f37953/889fe291f590c2c2aa2852856687efbb3e6fdd2faeca84da4bc0be2263f37953-json.log.pos
  tag docker.container.889fe291f590
  rotate_wait 5
</source>

<match docker.**>
  type stdout
</match>

In a live environment, the actual log contents could be sent to an elasticsearch cluster and viewed with kibana or graylog2. Alternatively, there are hosted services that can work with JSON as well.

Since containers ID are unwieldly to work with, I created a simple golang project called docker-gen that can generate arbitrary files using a template from the running docker container data. The example fluentd template in the project was used to generate the sample above.

Although not shown, docker-gen could also generate logrotate config files to rotate the container JSON files to avoid running out of disk space on the host. Hopefully, the docker project will address this in a future release.

Conclusion

This approach provides the following benefits:

  • The host is able to forward any container’s logs to a central log server using a single collection agent.
  • It does not require the applications to use syslog or write to a certain volume.
  • The host can access the container logs as well as any log files on the containers filesystem.
  • The host can rotate logs for the containers.

The one drawback to this approach is that it accesses the docker file system directly without using the API which means it could break in the future if a future docker release changes how it stores container logs on the host’s file system.

comments powered by Disqus