Adding data source connection
After installation and starting DQO, we describe how to add a connection to BigQuery public dataset Austin Crime Data using the graphical interface.
For a full description of how to add a data source connection to other providers or add connection using CLI, see Working with DQO section. You can find more information about navigating the DQO graphical interface here.
Prerequisite credentials
To add BigQuery data source connection to DQO you need the following:
- A BiqQuery service account with BigQuery > BigQuery Job User permission. You can create a free trial Google Cloud account here.
- A service account key in JSON format for JSON key authentication. For details refer to Create and delete service account keys.
- A working Google Cloud CLI if you want to use Google Application Credentials authentication.
Adding BigQuery connection using the graphical interface
-
Go to the Data Sources section and click + Add connection button in the upper left corner.
-
Select BiqQuery database type.
-
Add connection settings.
BigQuery connection settings Description Connection name The name of the connection that will be created in DQO. This will also be the name of the folder where the connection configuration files are stored. The name of the connection must be unique and consist of alphanumeric characters, hyphens and undescore. For example, "testconnection" Source GCP project ID Name of the project that has datasets that will be imported. In our example, it is "big-query-public-data". Billing GCP project ID Name of the project used as the default GCP project. The calling user must have a bigquery.jobs.create permission in this project. Authentication mode to the Google Cloud Type of authentication mode to the Google Cloud. You can select from the 3 options:
- Google Application Credentials,
- JSON Key Content
- JSON Key PathQuota GCP project ID The Google Cloud Platform project ID which is used for invocation. -
After filling in the connection settings, click the Test Connection button to test the connection.
-
Click the Save connection button when the test is successful otherwise you can check the details of what went wrong.
-
Import the "austin_crime" schema by clicking on the Import Tables button.
-
There is only one table in the dataset. Import the table by clicking Import all tables buttons in the upper right corner.
-
You can check the details of the imported table by expanding the tree view on the left and selecting the "crime" table.
There are several tabs to explore:
- Table - provide details about the table and allows you to add filters or stage names (for example, "Ingestion")
- Schedule - allows setting schedule for running checks. Learn how to configure schedules
- Comments - allows adding comments to your tables
- Labels - allows adding labels to your tables
- Data streams - allows configuring columns for data streams segmentation. Learn more about data streams segmentation in Concept section.
- Date and time columns - allows setting date and time columns for partition checks type and table timeliness checks subcategory.
Next step
Now that you have connected a data source, it is time to run data quality checks.