When a curated data set is needed for reporting, a common design pattern is to create a view in the same database as the warehouse. This view serves as the endpoint or “head” for BI tools and end-users. This “one view in the warehouse db” pattern (a view in the warehouse’s database) provides a layer of abstraction from underlying warehouse structures–the facts and dimensions in a star schema–but is not always the best approach due to several weaknesses (detailed below). Here we propose a high-level “two necks” design pattern with six parts: a dedicated reporting data mart database, a view in this reporting database, two staging tables, a view on top that serves as the single head or endpoint for users, and a processing approach that swaps between one of the two neck tables.
A view in the warehouse’s database is simple and can be implemented quickly, but it has weaknesses:
The weaknesses above can be mitigated with the two neck design shown in the diagram at the beginning of this article. Here’s how you can implement this solution architecture in Redshift:
1. Create a reporting database. Example:
CREATE SCHEMA report_schema;
2. Create a reporting view to serve as the integration component between the reporting data mart and the warehouse:
CREATE VIEW report_schema.v_example_dashboard_loader AS /*enter your logic here*/
Our view used 4 tables (1 fact and 3 dimensions), with 2 dimension tables with outer joins. We had 6 calculated fields with aggregations (3 DENSE_RANK functions, 2 SUM, and 1 MAX). and 24 columns. The view returned 539,000 rows in 2.376 seconds using a dc1.large single cluster.
3. Create the loader and backup tables. These two tables are the two necks.
CREATE TABLE report_schema.t_example_dashboard_loader
AS
SELECT *
FROM report_schema.v_example_dashboard_loader;
4. CREATE TABLE report_schema.t_example_dashboard_loader_backup
AS
SELECT *
FROM report_schema.t_example_dashboard_loader;–notice backup looks at the local primary table, not the view that hits the warehouse with a query that can have expensive logic
5. Create the “head” view that services as the endpoint for Tableau:
CREATE OR REPLACE VIEW report_schema.v_example_dashboard_userfacing
AS
SELECT *
FROM report_schema.t_example_dashboard_loader;
6. In the etl sql script, use logic that takes a backup, then swaps the user facing view to the backup, then loads the primary, then swaps to the primary. (Note: merge/upsert approaches can potentially be used here as well.)
TRUNCATE TABLE report_schema.t_example_dashboard_loader_backup;
INSERT INTO report_schema.t_example_dashboard_loader_backup
SELECT * FROM report_schema.t_example_dashboard_loader;
VACUUM report_schema.t_example_dashboard_loader_backup;
CREATE OR REPLACE VIEW report_schema.v_example_dashboard_userfacing
AS
SELECT *
FROM report_schema.t_example_dashboard_loader_backup;
TRUNCATE TABLE report_schema.t_example_dashboard_loader;
INSERT INTO report_schema.t_example_dashboard_loader
SELECT * FROM report_schema.v_example_dashboard_loader;
VACUUM report_schema.t_example_dashboard_loader;
CREATE OR REPLACE VIEW report_schema.v_example_dashboard_userfacing
AS
SELECT * FROM report_schema.t_example_dashboard_loader;
If we define “one neck” as one staging table (no view on top), and “two necks” as two staging/neck tables with one head view on top (which swaps between the neck tables), we can compare processing time.
a. One neck
TRUNCATE TABLE report_schema.t_example_dashboard_loader;
INSERT INTO report_schema.t_example_dashboard_loader
SELECT * FROM report_schema.v_example_dashboard_loader;
1. 7686ms
2. 6271ms
3. 6109ms
average: 6689ms, or about 6.689 seconds
b. Two necks (only the duration of the statements that affect the user-facing table need to be compared, because only these steps affect users).
CREATE OR REPLACE VIEW report_schema.v_example_dashboard_userfacing
AS
SELECT *
FROM report_schema.t_example_dashboard_loader_backup;
CREATE OR REPLACE VIEW report_schema.v_asset_dashboard_userfacing
AS
SELECT *
FROM report_schema.t_asset_dashboard_loader;
2462ms
1221ms
1405ms
average: 1696ms, or 1.696 seconds
Note: when we doubled or even quadrupled the processing power for the cluster size, and added another cluster node, we did not see clear performance improvements.
Conclusion
In our test results, the “two necks” approach addresses all four antipattern points introduced above: avoid tight coupling, decrease process time and user waiting, have a backup, and speed up user queries. We share these test results as an example where this pattern worked for us to challenge a common pattern (a view in the warehouse) so you can consider this two necks pattern for your own solution architecture.
Regarding processing time, the fastest run of two necks (with swapping the view to the staging table not undergoing processing) is more than 5x faster than the fastest run of the old way. On average, the new way is almost 4x faster. We expect these numbers to be more significant with larger data sets in billions of rows, after accumulation for months or years of warehouse data, rather than in our warehouse that was new. Our test results and sql durations are based on actual client test results based on one view, and our results are not necessarily representative of yours.
For further help with using Tableau with Redshift, you can refer to Tableau’s documentation here. Plaster Group’s Data and Analytics Team partners with Tableau and is happy to help you with any additional questions you might have.