
Heroku Logs Explained (and How To Stream Into BigQuery)
Published on Nov 6th, 2021 by Steven Maguire
Heroku is a fantastic platform, especially if you are managing an app that doesn’t need a ton of oversight. One of the main value propositions of Heroku is a “set it and forget it” philosophy. You can get your app up and running in a matter of minutes, with only a few clicks. Their suite of tools and features have logical defaults that provide value immediately. You can, of course, fine tune things as you like but the idea is soup right from the can. That is until you attempt to do anything meaningful with the server logs it produces. The feature that time forgot.
What are Heroku server logs?
A server log is a simple text snippet that contains details of a specific server activity. Heroku produces server logs for each of the services they manage for you.
Heroku Log Types
When accessing Heroku logs, you can expect to find four types of Heroku Runtime logs:
- App logs - logging output from the application itself,
- System logs - messages about Heroku platform infrastructure actions,
- API logs - messages about administrative actions taken by you and,
- Add-on logs - messages from add-on services.
Heroku also produces Build logs. Build logs capture data for every build attempt - successful and unsuccessful. Heroku Build logs are only available from your app’s Activity tab in the Heroku Web Dashboard.
Heroku Log Format
Heroku logs contain some important and well structured information. You can expect to find the following in Heroku logs:
- Timestamp - the date and time recorded when the log line gets captured,
- Source - your dynos all have the source, “app” while Heroku’s system components all have the source, “heroku,”
- Dyno - the name of the dyno or component that wrote the log line and ,
- Message - the content of the log line.
How to access Heroku Logs?
Heroku produces these logs for you and makes them available in several ways:
- via Command Line Interface (CLI),
- via Web Dashboard and,
- via Logging Add-on (connected by a Drain).
Heroku Log Retrieval via CLI
Heroku publishes a robust and powerful CLI utility. If you are not yet familiar with it, consider introducing yourself ASAP. You will become fast friends.
To view your app’s most recent logs, use the heroku logs
CLI command:
$ heroku logs
2010-09-16T15:13:46.677020+00:00 app\[web.1\]: Processing PostController#list (for 208.39.138.12 at 2010-09-16 15:13:46) \[GET\]
2010-09-16T15:13:46.677023+00:00 app\[web.1\]: Rendering template within layouts/application
2010-09-16T15:13:46.677902+00:00 app\[web.1\]: Rendering post/list
2010-09-16T15:13:46.678990+00:00 app\[web.1\]: Rendered includes/\_header (0.1ms)
2010-09-16T15:13:46.698234+00:00 app\[web.1\]: Completed in 74ms (View: 31, DB: 40) | 200 OK \[http://myapp.heroku.com/\]
2010-09-16T15:13:46.723498+00:00 heroku\[router\]: at=info method=GET path="/posts" host=myapp.herokuapp.com" fwd="204.204.204.204" dyno=web.1 connect=1ms service=18ms status=200 bytes=975
2010-09-16T15:13:47.893472+00:00 app\[worker.1\]: 2 jobs processed at 16.6761 j/s, 0 failed ...
Take note of the different types of logs that get returned. Some from the app’s web dynos, the Heroku HTTP router, and the app’s worker dynos.
By default, the logs command retrieves 100 log lines. You can retrieve up to 1,500 lines by using the --num
(or -n
) option.
$ heroku logs -n 200
As discussed, Heroku produces four log types. You can use the CLI to filter the results, targeting those log types. Use the --source
(or -s
) and --dyno
(or -d
) CLI filtering arguments:
$ heroku logs --dyno router
2012-02-07T09:43:06.123456+00:00 heroku\[router\]: at=info method=GET path="/stylesheets/dev-center/library.css" host=devcenter.heroku.com fwd="204.204.204.204" dyno=web.5 connect=1ms service=18ms status=200 bytes=13
2012-02-07T09:43:06.123456+00:00 heroku\[router\]: at=info method=GET path="/articles/bundler" host=devcenter.heroku.com fwd="204.204.204.204" dyno=web.6 connect=1ms service=18ms status=200 bytes=20375
$ heroku logs --source app
2012-02-07T09:45:47.123456+00:00 app\[web.1\]: Rendered shared/\_search.html.erb (1.0ms)
2012-02-07T09:45:47.123456+00:00 app\[web.1\]: Completed 200 OK in 83ms (Views: 48.7ms | ActiveRecord: 32.2ms)
2012-02-07T09:45:47.123456+00:00 app\[worker.1\]: \[Worker(host:465cf64e-61c8-46d3-b480-362bfd4ecff9 pid:1)\] 1 jobs processed at 23.0330 j/s, 0 failed ...
2012-02-07T09:46:01.123456+00:00 app\[web.6\]: Started GET "/articles/buildpacks" for 4.1.81.209 at 2012-02-07 09:46:01 +0000
$ heroku logs --source app --dyno worker
2012-02-07T09:47:59.123456+00:00 app\[worker.1\]: \[Worker(host:260cf64e-61c8-46d3-b480-362bfd4ecff9 pid:1)\] Article#record\_view\_without\_delay completed after 0.0221
2012-02-07T09:47:59.123456+00:00 app\[worker.1\]: \[Worker(host:260cf64e-61c8-46d3-b480-362bfd4ecff9 pid:1)\] 5 jobs processed at 31.6842 j/s, 0 failed ...
Heroku Log Retrieval Via the Web Dashboard
Log in to Heroku and navigate to your Heroku app dashboard. On this page select “more” you should see a drop down menu. Select “View logs” from this menu. You can view your logs using the web interface.
Log Retrieval Via a Logging Add-on
Heroku supports a robust Add-on platform. This platform makes it easy for you to integrate your app with first and third-party services. Heroku recommends integrating with a third-party Add-on service to help with log aggregation. Why do they recommend this? Because, again, their logging feature is a blindspot on the platform. You will need to take it upon yourself to extract value from the massive stream of log data produced by Heroku. Let’s talk about why this is a blindspot and how you can extract value.
Heroku does NOT store logs for you.
Heroku’s logging system goes by the name, “Logplex.” Logplex only collates and routes log messages, it does not store them. At most, it retains the most recent 1,500 lines of your consolidated logs, which expire after 1 week.
This means that you must do something proactive to consume and store your logs.
Also, keep in mind, Heroku does not give you access to tools to query or dig into those 1,500 most recent log lines they keep.
Heroku has made it clear - their position on logging is to produce them and stream them. That’s it.
Why should you store Heroku logs?
Earlier, we mentioned server logs contain valuable details about every server activity. Storing those logs, even for a short period of time, gives you the option to dig deeper should a need arise. Why would you need to dig deeper? Some common use cases for investigating server logs, to name a few:
- you have a data breach to investigate,
- you have a intrusion detection policy,
- you have a fraud detection policy,
- you have a data security & compliance policy,
- you have a chargeback case to document or,
- you have a customer support case to document.
If you are running a mature, legal-conscious application, you SHOULD store Heroku logs. And, not only store them - you should be able to query them with ease.
Store and query Heroku logs with BigQuery
Google’s BigQuery product is an amazing, powerful, and affordable option for Heroku logs. You can store and query millions of BigQuery records for less than a few dollars per month. Plus, the query language is very approachable if you have some experience with SQL.
Unfortunately, Heroku does not offer a direct-to-BigQuery logging add-on. So, it does need a bit of elbow grease to hook things up.
Choose a streaming strategy
There are two approaches to streaming Heroku logs into BigQuery:
- use the CLI to pull logs on your own or,
- use a log drain to let Heroku push logs to you.
Option 1, while possible and popular on other blogs, is fraught with issues. Because of Heroku’s 1,500 log retention cap, it is likely you could miss logs for a variety of reasons. Like, if your app experiences a surge in logging events and your utility is not on a cadence that can recover.
Option 2 is a more manageable configuration. Adding a log drain to your app is easy. Also, Heroku’s Logplex will batch log message delivery. If your app experiences a surge in logging event, that will not result in a DDOS-like load on your log drain. You will receive steady message deliveries with one or more log events. While this is the preferred option, it does come with some other considerations. Like, how will you authenticate log delivery?
Verify authenticity of incoming logs
For option 1, this is easy. You are initiating the extraction via CLI. Heroku’s CLI requires authentication. There is no question where the logs are coming from.
For option 2, using Basic Auth is a good approach to protecting your log drain. This will prevent bad actors or peers from storing log data from unexpected apps. You should to be in control of the flow of data streaming from Heroku to BigQuery.
Ensure your Heroku logs data is accurate
Regardless of the option you choose, there are a few things you will need to do to succeed in storing Heroku logs:
- you will need to parse the log messages and convert them to well structured JSON records,
- you will need to insert those well structured JSON records into BigQuery and,
- you will need to deduplicate the records to ensure an accurate data set.
Heroku logs use the SYSLOG standard. This means the format of the log messages follow common, predictable standards. With this in mind, you can convert the following log line example into a JSON record.
173 <190>1 2012-02-07T09:45:47.123456+00:00 host app web.1 - Rendered shared/\_search.html.erb (1.0ms)
{
"priority": 173,
"version": " <190>1",
"timestamp": "2012-02-07T09:45:47.123456+00:00",
"hostname": "host",
"app": "app",
"process\_id": "urgentworker.10",
"message": "Rendered shared/\_search.html.erb (1.0ms)"
}
With well formatted JSON records in hand, you can push them into BigQuery. BigQuery has a robust HTTP API that makes it easy to manage your BigQuery data.
Once your logs begin streaming into BigQuery you can do whatever you like with the data and sleep easy.
Stream your Heroku logs into BigQuery in minutes
If that sounds like a problem you’d like to solve and you’d like some help, consider BigQuery Log Drain.
BigQuery Log Drain is a simple solution that takes the effort off your backlog. BigQuery Log Drain will give you an authenticated log drain URL to add to your app.
All your logs will get streamed into your BigQuery tables. Solving this problem is the ONLY thing BigQuery Log Drain does. Plus, you will not get sales emails and your data will not get saved outside of your BigQuery tables.
Don't waste any more time on this trivial problem.
This problem, while common and mildly frustating, is likely not the biggest thing on your plate. You could be streaming your Heroku logs into your BigQuery tables by now.