EcommCo’s Data Lake leverages Amazon S3 and Amazon Redshift for dataset management and transformation of data ingested from Data Providers. EcommCo’s internal Data Providers include its CRM system and e-commerce platform. Additionally, EcommCo uses publicly available demographics datasets to gain additional insights about their customers.
Submissions are provided to the Data Lake in their native format to keep the “price of entry” low for Data Providers. This enables business users to commission transforms on an as-needed basis for analytics that offer business value. They need not know all of their requirements up front.
Submissions can be added to S3 Buckets via a variety of submission mechanisms, which can differ from provider to provider (i.e.,SFTP, API, etc.).
Datasets under management are stored in S3 Buckets.
Curated Datasets are a key concept of Data Lakes. They can include:
The diagram below illustrates submissions from ECommCo’s Data Providers and how they are transformed into Curated Datasets.
When you click this button, the following steps will be performed within your AWS account:
Visit S3 in your AWS Management Console and review the following Buckets: