EcommCo’s Data Lake leverages Amazon Redshift and Amazon Redshift Spectrum capabilities to transform, aggregate and analyze Curated Datasets. Redshift Spectrum allows data to be analyzed directly from S3, thus it can scale to support querying exabytes of data.
The diagram below illustrates how ECommCo uses Redshift Spectrum to join data that reside in S3 and Redshift.
Curated Datasets in S3 and Curated Datasets in Redshift can be transformed, joined, analyzed using Redshift Spectrum. Partitioned data on S3 can help optimize query performance by only scanning through partitions of interest.
One of ECommCo’s business stakeholder’s key questions is, “What is our expected Customer Lifetime Value (CLV)?”
An analytics process queries orders and products to aggregate per customer revenue and show averages across different customer segments:
EcommCo product catalogs are released annually. The products table is partitioned by date of a catalog release. EcommCo uses Redshift Spectrum to only analyze products for the most recent product catalog.
The Products table has 4 partitions. Since ECommCo’s business users are most interested in the most recent product catalog released at the end of 2016 year, Redshift Spectrum only analyzes 1 partition. This cuts down the time and required resources required to process a query.
ECommCo’s business users are very interested in seeing sales over time. Spectrum is used to analyze and aggregate orders on a year and month basis to generate sales over time analytics.
ECommCo’s business users are very interested in tracking their most popular SKUs. This Spectrum analytics process takes advantage of data partitioning of products in order to access only most recent product catalog.
When you click this button, the following steps will be performed within your AWS account:
Visit S3 in the AWS Management Console to see Spectrum analytics outputs stored as new datasets in {{ curated_bucket_name }} Bucket.
Navigate to Elasticsearch in the AWS Management Console and review at Top Curated Datasets Summary chart to see new Curated Datasets resulting from the Redshift Spectrum analytics.