EcommCo’s Data Lake leverages Amazon Athena for ad-hoc analyses, data validation and exploration.
The diagram below illustrates how ECommCo uses Athena to query S3 directly.
AWS Glue automaticaly discovered {{ datalake_submissions_database_name }} and {{ curated_datasets_database_name }}. They can be viewed in Amazon Athena console.
Visit Amazon Athena in the AWS Management Console to perform the queries below.
First, select the {{ curated_datasets_database_name }} Database in the dropdown menu.Let's look at number of orders across different customers' regions.
SELECT region, count(*) AS orders_within_region FROM orders_20170601_json o JOIN customers_20170601_json c ON o.customer_id = c.customer_id GROUP BY c.region ORDER BY count(*) desc;
Copy and paste queries into Athena and click “Run query.”
Make sure that {{ curated_datasets_database_name }} Database is selected prior to running query.
Let's look at household income across all states.
SELECT state, cast(avg(household_income) as bigint) as household_income FROM demographics_20170520_json GROUP BY state ORDER BY household_income desc;
Let's see how orders distribute across different product categories.
SELECT p.product_category, count(*) AS number_of_orders FROM orders_20170601_json o join products_20170601_json p on o.sku = p.sku GROUP BY product_category ORDER BY count(*) desc;