Skip to main content

Home

Install Unravel for GCP BigQuery - Multiple key authentication method

You can install Unravel on a GCP instance. Unravel is then set up to monitor the jobs, datasets, tables, and data using a service account outside the Google Cloud environment. This section provides instructions to install Unravel for GCP BigQuery using the multiple-key authentication method.

Before setting up Unravel for BigQuery, you must complete the Prerequisites.

Follow the instructions to install and set up Unravel to receive BigQuery data.

Installing Unravel on the GCP VM

Do the following to set up Unravel on the GCP VM.

  1. From your GCP console, go to the GCE dashboard and click Create Instance.

  2. Select the following options based on Unravel's instance requirements:

    • Base OS

    • Instance type and size

    • Ports

    • Networking

      The instance must be HTTPS and publicly accessible.

    • Firewall rules or policies

      Sample inbound rule

      Type

      Protocol

      Port range

      Source

      All traffic

      All

      All

      For example, 10.10.0.0/16

      SSH

      TCP

      22

      0.0.0.0/0 or trusted public IP for SSH access

      Custom TCP Rule

      TCP

      3000

      Custom TCP Rule

      TCP

      4043

      Sample outbound rule

      Type

      Protocol

      Port range

      Source

      All traffic

      All

      All

      0.0.0.0/0

    Note

    The GCP VM should have all TCP access to the BigQuery cluster (server/parent or worker) nodes. You can grant access by inserting adding firewall rules of the BigQuery server/parent and worker with all TCP, all port ranges.

    While creating the GCP VM, add the Firewall properties, Enable the HTTP and HTTPS traffic Go to Network tab, and add Network tags. (This is the firewall rule that is already created.)

    dataproce-https-allow.png
Configuring the GCE instance
  1. Disable selinux.

    sudo setenforce Permissive
  2. Edit /etc/selinux/config to ensure the setting persists after reboot and ensure SELINUX=permissive.

    sudo vi /etc/selinux/config
  3. Install libaio.x86_64, lzop.x86_64, and ntp.x86_64.

    sudo yum install -y libaio.x86_64
    sudo yum install -y lzop.x86_64
    sudo yum install -y ntp.x86_64
  4. Start ntpd and check the system time.

    sudo service ntpd start
    sudo ntpq -p
  5. Create a new Unravel user named unravel.

    sudo useradd unravel

Download Unravel onto the VM instance that you have created.

Deploy Unravel on the GCP instance that you have created.

You can also manually install Unravel; refer to Run setup

Note

The HTTPS load balancer for Unravel endpoint must be configured only when using the Push model.

Unravel LR endpoint should be available over a publically accessible HTTPS endpoint to receive messages from BigQuery PubSub. The Load Balancer is an easier and more secure method to push the log messages between the Google Cloud Platform (GCP) and Unravel. Use the following instructions to configure an HTTPS load balancer for Unravel with public endpoint and SSL termination.

You must have the following information handy before you configure the Load Balancer:

  • Region and Zone where the Unravel VM is running.

  • Network and Subnet-network where the Unravel VM is running.

  • A valid SSL certificate in GCP.

Do the following to create a Load Balancer

  1. Create an instance group. Refer to Create a managed instance group for detailed instructions.

    • In the New unmanaged instance group page, ensure to keep the following items the same as that of Unravel VM.

      • Location > Region

      • Location > Zone

      • Network and Instances > Network

      • Network and Instances > SubNetwork

    • Under Port Mapping, enter the following:

      • Port Name: http4043

      • Port Number: 4043

  2. Set up an HTTPS Load Balancer. Refer to Set up an HTTPS Load Balancer for detailed instructions. Ensure to do the following:

    • Under Name, update the name as unravel-loadbalancer.

    • In Backends > New Backend > Instance groups, select the Unravel instance group that you had created in Step 1.

    • Under Health check, do the following:

      • Select Create a health check, and then add the name as unravel-4043-hc

      • Update the Protocol as HTTP and Port as 4043.

      • Update the Request Path as /lr/status.

    • Ensure that Port is set to 443 to allow HTTPS traffic.

  3. After the Load Balancer is created, find the public IP address of the Load Balancer that is mentioned under Frontend section of the Load Balancer. Add the IP address of the Load Balancer to a valid DNS name.

Setup BigQuery for Unravel with multiple key-based authentication

Unravel can be set up to automatically create and configure resources in more than 100 projects at a time. Based on the authentication method you selected while installing Unravel, you can add projects either with customer-supplied credentials or with Unravel-generated credentials for Unravel monitoring. These can be single projects or multiple projects.

The multiple key-based authentication method lets you integrate Unravel using the key within each project.

multi-key.png

Unravel ships the following resources, which are required to automatically set up Unravel to receive BigQuery data.

  • Terraform

    This is an open-source software for infrastructure provisioning. The Terraform creates resources on the GCP account and facilitates the smooth integration of Unravel with your cloud platform.

    You can choose to either use the Terraform, which is bundled with Unravel installer or edit and use the external Terraform, which is provided separately, independent of the installer.

  • gcloud CLI

    Set of tools to create and manage Google Cloud resources.

BigQuery projects can be set by any one of the following. Ensure to complete the prerequisites before you set up BigQuery projects.

You can configure the BigQuery projects using one of the following methods:

  1. Stop Unravel.

    <Unravel installation directory>/unravel/manager stop
  2. Configure the BigQuery projects.

    You can use one of the following methods to configure the BigQuery projects manually:

    Do the following to setup Unravel to receive BigQuery data:

    A project ID is a unique string used to differentiate your project from all others in Google Cloud. You cannot edit a Project ID after it is generated.

    You can either create a project on the GCP account and get the Project ID or get the Project ID of an existing project and keep it handy. The Project ID must be set in Unravel later using the manager utility.

    Create a role for an Unravel project with the required permission for Unravel monitoring. This role is assigned to the Unravel service account that is used to authenticate Unravel applications.

    1. On the GCP Console, go to the Roles page.

    2. Using the drop-down list at the top of the page, select the Unravel project in which you want to create a role.

    3. Click Create Role and enter a NameTitle, and Descriptionfor the role. Preferably keep the role name as unravel for easy identification. The description is optional, which can be maintained as Unravel monitoring.

      Select Role launch stage as General Availability.

      Caution

      The role name cannot be changed after the role is created.

    4. Click Add Permissions.

    5. In the Add Permissions dialog box, filter and select the following permissions for the role, and then click Add.

    6. Enable the BigQuery Data Transfer API.

    7. Click Create. The role is created.

    A service account can be attached to a VM so that applications running on that VM can authenticate as the service account. To set up Unravel for BigQuery monitoring, you must create a service account and then assign the role created to this service account.

    1. On the GCP Console, go to the Create service account page.

    2. Select the Unravel project.

    3. Under Service Account Details, enter a service account name to display on the GCP Console. Preferably keep the service account name as unravel-service-account.

      Note

      A service account ID is generated based on this name. Edit the ID if required. You cannot change the ID later.

      Optional: Enter a description of the service account. Preferably keep the description as Unravel monitoring.

    4. Click CREATE AND CONTINUE.

    5. Under Grant this service account access to this project, select the role which was created in Step 2 with the associated permissions.

    6. Click Done. The service account is created, and the role is now attached to the service account.

    Each service account is associated with a public/private RSA key pair. The Service Account Credentials API uses this internal key pair to create short-lived service account credentials and to sign blobs and JSON Web Tokens (JWTs). This key pair is known as the Google-managed key pair. The Google-managed key pairs are used to authenticate calls to APIs.

    1. On the GCP Console, go to the Service accounts page.

    2. Select the Unravel project.

    3. Click the email address of the service account which you have created for Unravel.

    4. Click the Keys tab.

    5. Click the Add key drop-down menu, then select Create new key.

    6. Select  JSON as the Key type and click Create. A service account key file is downloaded. You can download this only once. Store the key file securely and transfer this file to Unravel node.

      Following is a sample of the downloaded key file:

      {
        "type": "service_account",
        "project_id": "unravel-test-337406",
        "private_key_id": "b111a5ad112bf0000d5c03b28b9a1d1f8ac31af",
        "private_key": "-----BEGIN PRIVATE KEY-----\n<privatekey>\n-----END PRIVATE KEY-----\n",
        "client_email": "unravel-service-account@unravel-test-337406.iam.gserviceaccount.com",
        "client_id": "112180000016964232364",
        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
        "token_uri": "https://accounts.google.com/o/oauth2/token",
        "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
        "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account-email"
      }

      The file path of the service account key must be set in Unravel using the manager utility.

    When queries are run, logs are generated in BigQuery. The logs can be pushed to Unravel via Pub/Sub topics. To route logs to Unravel, a Pub/Sub topic must be created.

    1. On the GCP console, go to thePub/Sub topics page.

      Note

      The logs are pushed to Unravel via Pub/Sub topics.

    2. Click Create topic.

    3. In the Topic ID field, enter an ID for your topic and click Create Topic. Preferably specify unravel-bigquery as the topic ID. The topic is now listed in the list of topics.

    Sinks control how Cloud Logging routes log. Using sinks, you can route the logs to supported destinations.

    1. On the GCP console, go to the Logs Router page.

    2. Click Create Sink and provide a name and description for the sink. Preferably name the Sink as unravel-sink.

    3. Under Sink Destination, from the Sink Service dropdown, select the Cloud Pub/Sub topic.

    4. From Select a Cloud Pub/Sub topic, select the Cloud Pub/Sub topic that you had created for Unravel project in Step 5.

    5. Under Choose logs to include in the Sink, add a filter to determine the logs that must be included in the log routing sink. For example

       (resource.type="bigquery_resource" AND ((protoPayload.methodName="jobservice.insert" AND  protoPayload.serviceData.jobInsertResponse.resource.jobName.jobId :*) OR (protoPayload.methodName="jobservice.jobcompleted" AND
        protoPayload.serviceData.jobCompletedEvent.job.jobName.jobId :*))) OR (resource.type="bigquery_dts_config" AND (labels.run_id :* AND resource.labels.config_id :*)) 

      In this example:

      • bigquery_resource indicates the BigQuery logs.

      • jobservice.insert indicates the jobs of running type.

      • jobservice.jobcompleted indicates the jobs of completed type.

    6. Click Create Sink.

    A subscription is created to subscribe to associated topics and set the process to send the logs to unravel. Do the following to create a subscription.

    1. On the GCP console, go to the Pub/Sub topics page.

    2. Select the topic that was created in Step 5.

    3. Scroll down to the Subscriptions tab and click Create Subscription > Create Subscription.

    4. In the Subscription ID text box, provide a name for the subscription. Preferably keep the Subscription ID as unravel-bigquery-sub.

      Note

      You must set the subscription ID in Unravel using the manager utility.

    5. Under the Delivery type, select Push or Pull.

      If you have selected Push, then in the Endpoint URL text box, provide the Log Receiver (LR) endpoint URL for PUSH messages. LR server receives the logs from google Pub/Sub and is the entry point to unravel. The criterion for the LR endpoint URLs are as follows:

      You must specify the endpoint URL in the following format:

      https://<unravel-hostname>:4443/logs/bigquery/<gcp-project-id>/bigquery/bigquery

      For example:

      https://playground-bq-4770.unraveldata.com:4443/logs/bigquery/unravelsaas-329506/bigquery/bigquery
    6. Provide the expiry time for the subscription.

    7. Under Message ordering, select the Order messages with an ordering key option. This will allow the messages tagged with the same ordering key to be received in the published order.

      Note

      This option can be selected only once when you are configuring Pub/Sub service for the first time.

    8. Optionally, you can filter the messages via Dead lettering and Retry Policy options.

    9. Click Create.

  3. Add projects.

    Note

    To remove projects, you can refer to Remove BigQuery projects from Unravel.

  4. Run <Unravel installation directory>/unravel/manager config bigquery show to verify. The following output is shown:

    /opt/unravel/manager config bigquery show
    -- Running: config bigquery show
    BigQuery support: Enabled
    LR endpoint: Default
    Mode: pull
    Polling: Default
    
    Billing data location: Not configured
    
    Authentication mode: multi
    
    Project: prj-01
       integration: true
       is_admin: false
       is_monitoring: true
       subscription_id: unravel-bigquery-sub
    Project: prj-02
       integration: true
       is_admin: false
       is_monitoring: true
       subscription_id: unravel-bigquery-sub
    
  5. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
  6. Start Unravel.

    <Unravel installation directory>/unravel/manager start
  7. Verify BigQuery integration

    1. On the GCP console, run test queries from the project integrated with Unravel.

    2. Using a supported web browser, navigate to Unravel URL (For example, https://<unravel-host>:3000) and log onto Unravel UI using the credentials.

    3. Navigate to Jobs tab and click All in the left panel. The details of the queries run from the GCP console will be listed.

      bq-jobs.png

      To verify the integration of administrator projects, check the Reservation column from the Projects tab.

      bq-verify-admin-project.png
  1. Stop Unravel.

    <Unravel installation directory>/unravel/manager stop
  2. Configure the BigQuery projects.

  3. Integrate BigQuery projects for Unravel monitoring. This is a mandatory step for Unravel-generated credentials. The following command configures all the projects added Unravel.

    Notice

    Do not run the following command if you have used external Terraform to automatically create resources.

    <Unravel installation directory>/unravel/manager config bigquery integrate

    A URL will be provided in the output.

    bigquery-integrate-url.png

    Note

    If you want to skip the interactive gcloud authentication by Unravel and handle the gcloud authentication on your own, then run the command as follows:

    <Unravel installation directory>/unravel/manager config bigquery integrate --skip-authorization
  4. Authenticate gcloud CLI.

    1. On a Google Chrome browser, copy the URL provided in the output, and in the sign-in dialog box, click Allow. Ensure to sign in to the gcloud CLI from the account that is authenticated with the required permissions.

      sign-in-google-auth.png
    2. From the Sign in to the glcoud CLI box, click Copy button to copy the authorization code.

      sign-in-google-auth-copy.png
    3. Go back to the terminal and paste the authorization code in the Enter authorization code field, and press ENTER. This will run the following actions in the background:

      • Authenticate the user with Google Cloud.

      • Configure the required resources on the GCP.

      • Encrypt the credentials (service account keys) and then integrate them with Unravel.

      • Integrate all the added BigQuery projects with Unravel.

      • Securely sign out the end user from the gcloud session.

  5. Run <Unravel installation directory>/unravel/manager config bigquery show to verify. The following output is shown:

    /opt/unravel/manager config bigquery show
    -- Running: config bigquery show
    BigQuery support: Enabled
    LR endpoint: Default
    Mode: pull
    Polling: Default
    
    Billing data location: Not configured
    
    Authentication mode: multi
    
    Project: prj-01
       integration: true
       is_admin: false
       is_monitoring: true
       subscription_id: unravel-bigquery-sub
    Project: prj-02
       integration: true
       is_admin: false
       is_monitoring: true
       subscription_id: unravel-bigquery-sub
    
  6. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
  7. Start Unravel.

    <Unravel installation directory>/unravel/manager start
  8. Verify BigQuery integration

    1. On the GCP console, run test queries from the project integrated with Unravel.

    2. Using a supported web browser, navigate to Unravel URL (For example, https://<unravel-host>:3000) and log onto Unravel UI using the credentials.

    3. Navigate to Jobs tab and click All in the left panel. The details of the queries run from the GCP console will be listed.

      bq-jobs.png

      To verify the integration of administrator projects, check the Reservation column from the Projects tab.

      bq-verify-admin-project.png

You must separately configure the BigQuery projects that you want to track from the Data page. The Data page on Unravel UI can show data for only up to 100 BigQuery projects.Data

Also, refer to Remove BigQuery projects from the Data page on Unravel UI.

  1. Add projects to the Data page.

    • To add single projects to the Data page, run the following:

      <Unravel installation directory>/unravel/manager config bigquery enable-datapage <project-id>

      For example: /opt/unravel/manager config bigquery enable-datapage myproject

    • To add multiple projects to the Data page, run the following:

      <Unravel installation directory>/unravel/manager config bigquery enable-datapage --batch </path/to/project-id-file> 

      For example: /opt/unravel/manager config bigquery enable-datapage --batch /opt/unravel/project-id-file

  2. Run <Unravel installation directory>/unravel/manager config bigquery show to verify if the project IDs are enabled for the Data page. The following sample output is shown:

    /opt/unravel/manager config bigquery show
    -- Running: config bigquery show
    BigQuery support: Enabled
    LR endpoint: Default
    Mode: pull
    Polling: Default
    
    Billing data location: Not configured
    
    Authentication mode: multi
    
    Project: sbanawar-01
       integration: true
       is_admin: false
       is_monitoring: true
       subscription_id: unravel-bigquery-sub
    Project: sbanawar-02
       integration: true
       is_admin: false
       is_monitoring: true
       subscription_id: unravel-bigquery-sub

Note

When you remove a BigQuery project from the Data page, then the associated data also gets deleted from OpenSearch.

  1. Stop Unravel.

    <Unravel installation directory>/unravel/manager stop
  2. Run the following command from the Unravel installation directory.

    • For single project

      <Unravel installation directory>/unravel/manager config bigquery delete-datapage <project-ID>

      For example: /opt/unravel/manager config bigquery delete-datapage --batch my-project

    • For multiple projects

      <Unravel installation directory>/unravel/manager config bigquery delete-datapage --batch </path/to/project-id-file>

      For example: /opt/unravel/manager config bigquery delete-datapage --batch /opt/unravel/my-projects.txt

  3. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
  4. Start Unravel.

    <Unravel installation directory>/unravel/manager start

Note

Only single billing account is supported.

You must separately configure the billing data. For this, you must export the billing data to a BigQuery table and integrate Unravel with that table for Unravel to query the table.

The following information is required for integrating the billing data:

  • Project ID in which billing data is exported. This project ID should be integrated with Unravel for monitoring.

  • Dataset in which the billing data is exported.

  • Table in which billing data is exported.

To export billing data for BigQuery monitoring, do the following:

  1. Export Billing Data to a BigQuery DataSet. Refer to Set up Cloud Billing data export to BigQuery for detailed instructions.

  2. Stop Unravel.

    <Unravel installation directory>/unravel/manager stop
  3. Run the following command and set the billing info.

    <Unravel installation directory>/unravel/manager config bigquery set-billing-data

    You are prompted to enter the following:

    • Project ID where billing export is enabled

    • Dataset ID where data is getting exported

    • Name of the table where data is getting exported

    Note

    To unset the properties that you have set for exporting billing data, run the following command:

    <Unravel installation directory>/unravel/manager config bigquery unset-billing-data

To verify BigQuery integration with Unravel, do the following:

  1. On the GCP console, run test queries from the project integrated with Unravel.

  2. Using a supported web browser, navigate to Unravel URL (For example, https://<unravel-host>:3000) and log onto Unravel UI using the credentials.

  3. Navigate to Jobs tab. The details of the queries run from the GCP console will be listed under the All tab.

    bq-jobs.png

    To verify the integration of administrator projects, check the Reservation column from the Projects tab.

    bq-verify-admin-project.png

Note

When you remove a BigQuery project from Unravel, then the associated data also gets deleted from OpenSearch.

You can perform the following steps to remove BigQuery projects from Unravel. In case you have integrated BigQuery projects using Terraform, then refer Disintegrating GCP projects with Unravel

  1. Stop Unravel.

    <Unravel installation directory>/unravel/manager stop
  2. Run the following command from the Unravel installation directory.

    • For single project

      <Unravel installation directory>/unravel/manager config bigquery remove <project-ID>

      For example: /opt/unravel/manager config bigquery remove my-project

    • For multiple projects

      <Unravel installation directory>/unravel/manager config bigquery remove --batch </path/to/project-id-file>

      For example: /opt/unravel/manager config bigquery remove --batch /opt/unravel/my-projects.txt

  3. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
  4. Start Unravel.

    <Unravel installation directory>/unravel/manager start