What products do our customers like to buy? To answer this question, we need to look first at the transaction data, which should indicate what customers like and actually ending up buying.
To analyze the transactional data, we will ingest it using Data Stores in QDS that lets you easily transfer structured data from a RDBMS. In this section, we will:
By creating this Data Store, QDS can import/access structured data stored in MySQL database named {{ config['rds_database_name'] }}.
Now let’s switch over to Analyze interface. This interface enables data engineers to run complex queries and commands using multiple engines (Hive, Hadoop, Presto, Spark) as well as import and export data from/to external datastores as demonstrated below.
CREATE DATABASE {{ config['qubole_database_name'] }}
Next, we will create Hive tables in {{ config['qubole_database_name'] }} database and import MySQL data stored in {{ config['rds_database_name'] }} using the data store we created earlier.
Note: This import process will take a few mins to execute. You can track progress of import queries in the Analyze History tab.
To confirm that the Hive tables were created successfully, click on Tables tab on the top left and refresh {{ config['qubole_database_name'] }}. You can also run Hive commands ‘use {{ config['qubole_database_name'] }}‘ followed by ‘show tables;’ - In either case, you should see the following tables:
By completing these exercises, you have learned the initial steps of ingesting structured data into QDS,transforming it into desired file format (ORC in our case) and creating unified Hive schema. This schema will be used to query the data across Hadoop, Spark and/or Presto clusters. Next, we will see how to query the structured data using Hive in QDS.