Step 7 - Athena

AmazonAthena

EcommCo’s Data Lake leverages Amazon Athena for ad-hoc analyses, data validation and exploration.

The diagram below illustrates how ECommCo uses Athena to query S3 directly.

a. Inspect tables in data catalog

{% include 'error_box.html' %}

AWS Glue automaticaly discovered {{ datalake_submissions_database_name }} and {{ curated_datasets_database_name }}. They can be viewed in Amazon Athena console.

b. Run Athena queries

Visit Amazon Athena in the AWS Management Console to perform the queries below.

First, select the {{ curated_datasets_database_name }} Database in the dropdown menu.

Example 1: Explore orders across regions

Let's look at number of orders across different customers' regions.

SELECT region,
         count(*) AS orders_within_region
FROM orders_20170601_json o
JOIN customers_20170601_json c
    ON o.customer_id = c.customer_id
GROUP BY c.region
ORDER BY count(*) desc;

Copy and paste queries into Athena and click “Run query.”

Make sure that {{ curated_datasets_database_name }} Database is selected prior to running query.

Example 2: Explore household income across states

Let's look at household income across all states.

SELECT state,
        cast(avg(household_income) as bigint) as household_income
FROM demographics_20170520_json
GROUP BY state
ORDER BY household_income desc;

Example 3: Explore orders across product categories

Let's see how orders distribute across different product categories.

SELECT p.product_category,
         count(*) AS number_of_orders
FROM orders_20170601_json o join products_20170601_json p on o.sku = p.sku
GROUP BY product_category
ORDER BY count(*) desc;