Question

On our testing DWH server, we're using Vertica Community Edition. We're very pleased with ETL process and queries performance.

We import data from few sources (Informix, SQL Server, MySQL, Google Docs) putting it in one schema, with prefix bussinessProcessName_stage_

At the end of ETL process there is many tables called bussinessProcessName_fact and bussinessProcessName_dim_dimName and few shared_dim_dimName.

Is there a need for separating those tables in different schema, or maybe different database? The same question is for data marts inside DWH.

We're using star-schema mostly, show-flake few times, and there is even one flat-table designed data mart.

Was it helpful?

Solution

Most database systems use the database level as the highest logical hierarchy. However, Vertica only allows a single database to be running at a single time. This means that the logical design takes place at the schema level. For example, a traditional database system may have databases such as customers and orders. In Vertica, these would be identified at the schema level.

The logical organization and naming convention varies between organizations. What matters is that it's standard and used consistently. You may want to look at what logically fits together, and group tables accordingly. For example, each business process name can have its own schema (business_process_name.fact_table). It's better practice to be explicit rather than using a single schema, even if you have few tables. If in the future you add additional tables, it'll be easier to manage.

Another benefit of using schemas would be for administration. When performing backups, or maintenance tasks, they can be performed at the schema level.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top