An overview of the Redshift integration with Seda
There are three steps to connect Redshift with Secoda:
Create a database user
Connect Redshift to Secoda
Whitelist Secoda IP Address
The username and password you’ve already created for your cluster is your admin password, which you should keep for your own usage. For Secoda, and any other 3rd-parties, it is best to create distinct users. This will allow you to isolate queries from one another using WLM and perform audits easier.
To create a new user, you’ll need to log into the Redshift database directly and run the following SQL commands:
Secoda only uses the system tables for our metadata extraction, the extraction query can be viewed here.
When connecting to Redshift in Secoda, use the username/password you’ve created here instead of your admin account.
After creating a Redshift warehouse, the next step is to connect Secoda:
In the Secoda App, select ‘Add Integration’ on the Integrations tab
Search for and select ‘Redshift’
Enter your Redshift credentials
Click 'Connect'
VPCs keep servers inaccessible to traffic from the internet. With VPC, you’re able to designate specific web servers access to your servers. In this case, you will be whitelisting the Secoda IPs to read from your data warehouse.
Redshift clusters can either be in a EC2 Classic subnet or VPC subnet.
If your cluster has a field called Cluster Security Groups
, proceed to EC2 Classic
Or if your cluster has a field called VPC Security Groups
, proceed to EC2 VPC
Navigate to your Redshift Cluster settings: Redshift Dashboard > Clusters > Select Your Cluster
Click on the Cluster Security Groups
Open the Cluster Security Group
Click on “Add Connection Type”
Choose Connection Type CIDR/IP and authorize the Secoda IPs to read into your Redshift Port
Navigate to your Redshift Dashboard > Clusters > Select Your Cluster
Click on the VPC Security Groups
Select the “Inbound” tab and then “Edit”
Allow Secoda to read into your Redshift Port using the Secoda IP addresses.
You can find more information on that here.
Navigate back to your Redshift Cluster Settings: Redshift Dashboard > Clusters > Select Your Cluster
Select the “Cluster” button and then “Modify”
Make sure the “Publicly Accessible” option is set to “Yes”
List of all the metadata that Seda pulls from to Redshift
Seda pulls the following metadata from Redshift:
Tables
Name
Description
Schema
Database
External Usage (Popularity)
External Updated At
Views
Name
Description
Schema
Database
External Usage (Popularity)
External Updated At
Columns
Name
Description
Type
Foreign Key
Primary Key
Column Profile
Min
Max
Median
STD Deviation
Value Distribution
Statistic Value Count
Percent Filled
Unique
Creation Query
Common Queries
Lineage
Redshift Column <-> Redshift Column
Redshift View <-> Redshift View
Redshift Table <-> Redshift Table
Redshift Table <-> Redshift View
Redshift Table <-> Dashboards from other sources
Redshift Table <-> Tables from other sources
Redshift Table, Views <-> Jobs from other sources
Preview of first 50 rows (Optional)