How do I use Amazon Athena to analyze my Application Load Balancer access logs?

4 minute read

I want to use Amazon Athena to analyze my Application Load Balancer access logs.

Short description

Elastic Load Balancing doesn't activate access logs by default. When you activate access logs, you must specify an Amazon Simple Storage Service (Amazon S3) bucket. Athena analyzes Application Load Balancer and Classic Load Balancer access logs and stores the logs in the Amazon S3 bucket.

Note: This resolution applies only to Application Load Balancers.

Resolution

Create a database and table for Application Load Balancer logs

Complete the following steps:

Open the Athena console.
To create a database, run the following command in the Query Editor:
```
CREATE DATABASE alb_db
```
Note: It's a best practice to create the database in the same AWS Region as the Amazon S3 bucket.

In the database, create an alb_logs table for the Application Load Balancer logs.
Example query:

CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs(
    type string,
    time string,
    elb string,
    client_ip string,
    client_port int,
    target_ip string,
    target_port int,
    request_processing_time double,
    target_processing_time double,
    response_processing_time double,
    elb_status_code int,
    target_status_code string,
    received_bytes bigint,
    sent_bytes bigint,
    request_verb string,
    request_url string,
    request_proto string,
    user_agent string,
    ssl_cipher string,
    ssl_protocol string,
    target_group_arn string,
    trace_id string,
    domain_name string,
    chosen_cert_arn string,
    matched_rule_priority string,
    request_creation_time string,
    actions_executed string,
    redirect_url string,
    lambda_error_reason string,
    target_port_list string,
    target_status_code_list string,
    classification string,
    classification_reason string,
    conn_trace_id string
)
PARTITIONED BY(day STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES(
    'serialization.format' = '1',
    'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\" ?([^ ]*)?( .*)?')
LOCATION 's3://<your-alb-logs-directory>/AWSLogs/<ACCOUNT-ID>/elasticloadbalancing/<REGION>/'
TBLPROPERTIES(
    "projection.enabled" = "true",
    "projection.day.type" = "date",
    "projection.day.range" = "2022/01/01,NOW",
    "projection.day.format" = "yyyy/MM/dd",
    "projection.day.interval" = "1",
    "projection.day.interval.unit" = "DAYS",
    "storage.location.template" = "s3://<your-alb-logs-directory>/AWSLogs/<ACCOUNT-ID>/elasticloadbalancing/<REGION>/${day}"
)

Note: In the preceding query, replace the table name and S3 locations with your table name and S3 locations. For projection.day.range, replace 2022/01/01 with a start date. For better query performance, create a table with partition projection.

To create a table with partitions that are stored in the AWS Glue Data Catalog, run the following query:

CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs_partitioned(type string,
    time string,
    elb string,
    client_ip string,
    client_port int,
    target_ip string,
    target_port int,
    request_processing_time double,
    target_processing_time double,
    response_processing_time double,
    elb_status_code int,
    target_status_code string,
    received_bytes bigint,
    sent_bytes bigint,
    request_verb string,
    request_url string,
    request_proto string,
    user_agent string,
    ssl_cipher string,
    ssl_protocol string,
    target_group_arn string,
    trace_id string,
    domain_name string,
    chosen_cert_arn string,
    matched_rule_priority string,
    request_creation_time string,
    actions_executed string,
    redirect_url string,
    lambda_error_reason string,
    target_port_list string,
    target_status_code_list string,
    classification string,
    classification_reason string,
    conn_trace_id string
)
PARTITIONED BY(day string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES(
    'serialization.format' = '1',
    'input.regex' ='([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\" ?([^ ]*)?( .*)?')
LOCATION 's3://<your-alb-logs-directory>/AWSLogs/<ACCOUNT-ID>/elasticloadbalancing/<REGION>/'

Note: In the preceding query, replace the table name and S3 locations with your table name and S3 locations.

(Optional) To load the partitions, run the following ALTER TABLE ADD PARTITION DDL statement:

ALTER TABLE alb_logs_partitioned ADD PARTITION (day = '2022/05/21') LOCATION's3://<your-alb-logs-directory>/AWSLogs/<ACCOUNT-ID>/elasticloadbalancing/<REGION>/2022/05/21/'

Note: If your table uses partition projection, then skip the load partitions step. Don't use an AWS Glue crawler on the Application Load Balancer logs.

In the navigation pane, under Tables, choose Preview table.
In the Results window, view the data from the Application Load Balancer access logs.
Note: To run SQL statements on the table, use the Query editor.

Related information

Access log entries

Querying Application Load Balancer logs

Topics

Analytics

Related videos

Watch Islam's video to learn more (4:21)

AWS OFFICIALUpdated a month ago

13 Comments

This is great but requires some corrections:

For example, SELECT * from alb_log where elb_status_code = '503'; will fail with "TYPE_MISMATCH: line 1:46: Cannot apply operator: integer = varchar(3)" error:

and we should use the following instead: SELECT * from alb_log where elb_status_code = 503;

Rez

replied a year ago

Also this query:

SELECT COUNT(request_verb) AS count, request_verb, client_ip FROM alb_log_partition_projection WHERE day = '2022/05/21' GROUP BY request_verb, client_ip;

Needs to be changed to:

SELECT COUNT(request_verb) AS count, request_verb, client_ip FROM alb_log WHERE day = '2022/05/21' GROUP BY request_verb, client_ip;

Not a major issue its just the table name according to your tutorial.

Rez

replied a year ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

MODERATOR

AWS Official

replied a year ago

Create table queries missing REGION from the S3 path

Davd Ferguson

replied a year ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

MODERATOR

AWS Official

replied a year ago

The region is still missing from the S3 paths in the first CREATE TABLE query. Without this, the table doesn't work and returns no results.

Kyle Arch

replied a year ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

MODERATOR

AWS Official

replied a year ago

The first create table in step 3 has an extra single quote in the LOCATION line:

LOCATION ''s3://<your-al...

Allen Lyons

replied 8 months ago

This example in the article:

SELECT target_ip, (Count(target_ip)* 100.0 / (Select Count(*) From alb_logs)) as backend_traffic_percentage FROM alb_logs GROUP by target_ip< ORDER By count() DESC;

Give this error when running:

error - line 5:1: mismatched input '<'. Expecting: <predicate>

I was able to resolve it by removing < after target_ip on line 4.

SELECT target_ip, (Count(target_ip)* 100.0 / (Select Count(*) From alb_logs)) as backend_traffic_percentage FROM alb_logs GROUP by target_ip ORDER By count() DESC;

Jeremiah

replied 6 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

MODERATOR

AWS Official

replied 6 months ago

Is there something that needs to be done to load the log files into athena? I followed this article step by step. I have the table created but I do not see the logs. I have logs in S3. What is the secret?

asking4afriend

replied 5 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

MODERATOR

AWS Official

replied 5 months ago

This article helped me to solve the issue here: https://docs.aws.amazon.com/athena/latest/ug/application-load-balancer-logs.html

Hristo

replied 2 months ago

Relevant content

IP access logs for LAMP instance with Load Balancer
l4mp1
asked 4 years ago
Storing Application Load Balancer access logs in a KMS-encrypted S3 bucket
cmeinschein
asked 5 years ago
How can I migrate Classic Load Balancer to Application Load Balancer
rePost-User-2773628
asked a year ago
Filter application load balancer timeouts or 5xx or 4xx errors
Accepted Answer
Rajesh Khanna B
asked a year ago
Beanstalk Application Load Balancer Access Logs WILL NOT DISABLE in console
00 111
asked a year ago
How do I migrate from Classic Load Balancer to Application Load Balancer or Network Load Balancer?
AWS OFFICIALUpdated 2 years ago
How do I register an Application Load Balancer behind a Network Load Balancer?
AWS OFFICIALUpdated 6 months ago
How can I find an ELB access log file for a Classic Load Balancer, Application Load Balancer, or Network Load Balancer in Amazon S3?
AWS OFFICIALUpdated a year ago
How do I use Amazon Athena to query my Application Load Balancer connection logs?
AWS OFFICIALUpdated 3 months ago
Using AWS Application Load Balancer for Blue / Green Application Migration to VMware Cloud on AWS
EXPERT
Jay Scheponik
published 7 months ago