Business Use Case

What products do our customers like to buy?
To answer this question, we need to look first at the transaction data, which should indicate what customers like and actually ending up buying.

Ingest Structured Data

To analyze the transactional data, we will ingest it using Data Stores in QDS that lets you easily transfer structured data from a RDBMS. In this section, we will:

Create Data Store using Explore Interface in QDS

  1. Switch to Explore interface and then select + Add Data Store from the dropdown
  2. Enter
    • Data Store Name: {{ config['qubole_data_store_name'] }}
    • Database Type: MySQL
    • Database Name: {{ config['rds_database_name'] }}
    • Host Address: {{ config['rds_hostname'] }}
    • Port: {{ config['rds_port'] }}
    • Username: {{ config['rds_username'] }}
    • Password: [Password for your database]
    • Skip Validation:[Check this field]
  3. Click Save

By creating this Data Store, QDS can import/access structured data stored in MySQL database named {{ config['rds_database_name'] }}.

Now let’s switch over to Analyze interface. This interface enables data engineers to run complex queries and commands using multiple engines (Hive, Hadoop, Presto, Spark) as well as import and export data from/to external datastores as demonstrated below.

Create Database

  1. Switch to Analyze interface
  2. Run the following query by pasting it in QDS query editor and then click the button on the right, which looks like: Run
    CREATE DATABASE {{ config['qubole_database_name'] }}
  3. To confirm, click on Tables tab and hit refresh button - you should see {{ config['qubole_database_name'] }} database.

Next, we will create Hive tables in {{ config['qubole_database_name'] }} database and import MySQL data stored in {{ config['rds_database_name'] }} using the data store we created earlier.

Import Structured Data Stored from MySQL

  1. Switch to Analyze interface
  2. Click on + Compose
  3. Select Data Import from the right hand panel dropdown under “Hive Query”
  4. Select Data Store: {{ config['qubole_data_store_name'] }}
  5. Select first table (categories) from DB Table dropdown
  6. Select Hive Database: {{ config['qubole_database_name'] }}
  7. Enter Hive Table Name to be the same as the one in the Data Store and select Create this table from the dropdown
  8. Scroll up to top of the page
  9. Click on Run
  10. Import rest of the tables by clicking this button

Note: This import process will take a few mins to execute. You can track progress of import queries in the Analyze History tab.

To confirm that the Hive tables were created successfully, click on Tables tab on the top left and refresh {{ config['qubole_database_name'] }}. You can also run Hive commands ‘use {{ config['qubole_database_name'] }}‘ followed by ‘show tables;’ - In either case, you should see the following tables:

  • categories
  • customers
  • departments
  • order_items
  • orders
  • products

What you’ve learned

By completing these exercises, you have learned the initial steps of ingesting structured data into QDS,transforming it into desired file format (ORC in our case) and creating unified Hive schema. This schema will be used to query the data across Hadoop, Spark and/or Presto clusters.
Next, we will see how to query the structured data using Hive in QDS.