Step 4: CloudWatch Automated Response

Author: Dean Suzuki (Last Updated: 6/23/20)

Abstract: Automated Alerting, Dashboards, and Analytics

In the previous lab, you learned how to deploy the CloudWatch agent to capture logs and metrics from instances running in your environment. However, if you have 100’s or 1000’s of servers, its too overwhelming to review all the CloudWatch logs. You need to build automation to create automatic alerting and remediation.

In this lab, you will:

  • Learn how to automate responses to events in two methods:

    • Method 1: Using CloudWatch event rules

    • Method 2: Using CloudWatch metrics and alerts

  • Create CloudWatch dashboard to see the trends of CloudWatch metrics over time.

  • Use CloudWatch Log Insights to analyze the data in your logs.

Prerequisites

This lab builds upon the Step 3: CloudWatch Logging lab.

Section 1: Creating New Metrics from Log Data

In the last lab, you configured the CloudWatch agent to capture the IIS logs off the web servers. Let’s say that you wanted to count the number of 404 (page not found) errors that were in the IIS logs. You could do this by creating a CloudWatch metric that would keep track of the number of times that this 404 error shows up in the log. In this section, you will walkthrough this procedure. On a broader perspective, the goal of this lab is to enable to know how to create metrics from key data found in the CloudWatch log files so that you can use this procedure for any future logs that you maybe capturing in your environment.

Please note that the metrics only analyzes data after the metric filter has been created. It does not retroactively look through the past logs for this filter condition (e.g. searching for 404 error codes).

  1. Open the AWS Console and go to the CloudWatch console.

  2. On the left navigation, select Logs > Log groups > IIS Logs.

  3. Once inside the IIS Logs log group, click IIS-Log-Stream.

  4. In the filter events search box, type 404. You will seen the 404 events in the log files.

  5. Go back and on the left navigation, select Logs > Log groups > and then click IIS Logs link.

  6. In the middle of the screen, select the Metric filters tab.

  7. On the right, click the Create metric filters button.

  8. In the Filter Pattern field, paste the following filter pattern.

    [time, date, sip, csmethod, csuristem, csuriquery, sport, csusername, cip,
    csuseragent, csreferer, csstatus = 404, cssubstatus, scwin32status, timetaken]
    

    The above filter pattern is used to parse the IIS log entry which is a spaced delimited entry. Below is an example of an IIS Log entry.

2020-03-12 22:29:40 172.16.22.0 HEAD /robots.txt - 80 - 172.16.54.112 - - 404 0 2 0

The terms in the filter pattern correspond 1 for 1 with the terms in the log. We are assigning a name to each value. You are also filtering the items that have the csstatus = 404. One key item is that the terms in the filter pattern can not have a dash in the name to separate different parts (e.g. sip is really source ip or s-ip) so we have merged the terms together. To learn more about Filter and Pattern syntax, please see this documentation.

  1. Press Next.

  2. On the Assign metric screen,

    • For Filter Name: 404-Errors-In-IISLogs

    • Metric Namespace: Lab-LogMetrics

    • Metric Name: 404-Page-Not-Found

    • Confirm that Metric Value is 1

    • Set Default Value to 0.

  3. Press Next.

  4. On the Review and create screen, press Create metric filter.

Section 1.1: Generate Some 404 Errors

Next, you will generate some 404 errors to use to test.

  1. Open the EC2 console in a new tab.

  2. In the left navigation, select Load Balancers

  3. There should be a load balancer created as part of the lab setup, select it and go to the Description panel below. Look for the DNS field and copy it to a notepad file.

  4. Open a new tab in your browser, and paste the DNS name in the URL field. A page should be displayed with content.

  5. Paste the URL again and add to the end “/yourname” and then press enter. This should generate a 404 file or directory not found error. Repeat this a couple more times with different words to generate more data.

Section 1.2: Review the 404 Errors in CloudWatch Metrics

  1. Switch back to the CloudWatch tab.

  2. Click Metrics

  3. You may need to click All to go back to the Top Level

  4. In the middle, click Lab-LogMetrics to select the log namespace that you created earlier.

  5. Select the Metrics with no dimensions

  6. Select the metric that you created, 404-Page-Not-Found.

  7. Select the Graphed metrics tab. On the right side on the menu bar,

    • Notice the different options for Statistic, select Sum to count the number of 404 errors.

    • For Period, change the period to 30 seconds. Now, you have create a new metric given data found in the logs. Next, you will create a dashboard so that you can refer to this information at any time.

In this section, you will learn how to create dashboards to see trends in your data.

  1. Click the Actions button and select Add to dashboard.
  2. On the Add to dashboard dialog,

    • Select Create new

    • For dashboard name, enter WebServers-Dashboard

    • Press the checkmark after entering the name.

    • Review the other settings and then press Add to dashboard. Notice that new dashboard that has been created.

  3. Press Add widget. Select Line. Press Configure.

  4. Select CWAgent. Select ImageId, InstanceId..

  5. Select the metric for LogicalDisk and LogicalDisk % Free Space.

  6. Press Create widget.

  7. Press Save dashboard. Now, you have created a dashboard showing the your new metric that you created from the logs and a metric that was captured by the CloudWatch agent (LogicalDisk % free space).

Section 1.4: Create CloudWatch Alarm

In this section, you will create an alarm if the 404 metric goes above a certain threshold.

  1. In the CLoudWatch console, click Metrics.

  2. Select the Lab-LogMetrics 404-Page-Not-Found metric that you created and click the bell icon to create an alarm.

  3. On the Specify metrics and conditions, review the options available under the Conditions area.

    • Set Threshold type: Static

    • Select “Whenver 404-Page-Not-Found” is: Greater

    • Set than to: 1 (For lab purposes, we are setting a low threshold)

    • Open the Additional Configuration area, and review the settings available.

    • Press Next.

  4. On the Configure actions screen,

    • Notice the alarm states that you could configure the alarm on

    • Notice also that you can send a notification using AWS Simple Notification Service (SNS). You can also initiate an Auto Scaling operation. This might be used in a web farm scenario where the CPU utilization goes above a certain threshold, then you may want an autoscaling action initiated.

  5. Select Cancel. We are not going to email notification, but we wanted to show you what the alarm options are.

Section 2: Using CloudWatch Events to generate automated responses

CloudWatch has another tool to help with automated alerting and automated responses called CloudWatch Events. You will work with CloudWatch events and trigger an event when the state of an EC2 instance changes to Shutting-down. This following is an adaptation of the tutorial (here).

  1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/

  2. In the left navigation pane, choose Events and then Rules under it. Select Create rule.

  3. For Event source, notice that you can trigger a CloudWatch event based upon some event pattern or on a schedule. The schedule is a useful option if you want to trigger some action to occur on a periodic basis. You can create a CloudWatch event that triggers on a schedule to initiate that action.

  4. For the lab, choose Event Pattern.

  5. For the Build event pattern to match events by service,

    1. For Service, choose EC2,

    2. For Event Type, EC2 Instance State-change Notification.

    3. Choose Specific state(s), Shutting-down. Suppose in this scenario, you want to be alerted any time one of your production servers is being shut down.

    4. Select Any instance.

    5. Notice in the Event Pattern Preview window that you can see the pattern code.

  6. On the right side under Targets, select Add target. Review the different options that are available. With Lambda, you can create a function to execute some program. You could launch a SNS notification. Also, there is integration with System Manager and you can execute a SSM Run Command or SSM Automation document.

  7. Select Cancel. You are not going to create the email notification. We wanted to share with you the CloudWatch event rule capabilities.

In this lab, you learned how CloudWatch Events rules can trigger actions and alerts based upon certain event patterns or on a schedule. In most cases, CloudWatch Events rules trigger sooner after an event occurs versus the prior method of creating a metric and then creating an alarm off the metric.

For more information on CloudWatch Events, please review the documentation.

Step 3: Using CloudWatch Log Insights to analyze the data in your logs

In the prior lab, you installed and configured the CloudWatch log agent on your servers so that you could capture the logs from the Windows host. In this section, you will get hands-on experience with another CloudWatch tool called CloudWatch Log Insights to analyze the data inside your logs. Suppose you wanted to query all the security event logs across all your servers for failed login attempts. In this lab, you will query the security event logs across all your servers for login events (so that we can return some data).

  1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/

  2. In the left navigation pane, choose Logs and then Insights under it.

  3. In the Select log groups dropdown, select Security.

  4. In the query box, paste the following query:

    fields @timestamp, @logStream | filter @message like "<EventID>4624</EventID>"
    

In this scenario, you are querying the Security event logs for logon/logoff events (event id = 4264). You could refine the process to look for failed logon events. You are looking for all login events so that you could see some results.

  1. In the time range box to the right of the log group, specify the time/date period that you are in the lab.

  2. Press Run query.

  3. In the area below the Logs window, you should see the results returned from the query across your log groups. In my lab, I had two results. You can expand the arrows to drill down into the return results.

Congratulations!

In this lab, you learned how to use some of the CloudWatch tools to assist you with automating the monitoring of your environment through building automated alerts and responses. You learned how to configure CloudWatch metrics and alarms. Also, you got hands-on experience with CloudWatch events At the end of the lab, you walked through using CloudWatch Log Insights to initiate queries for data inside your logs.