Skip to main content

Home

Google BigQuery

Before setting up Unravel for BigQuery, ensure that the installation requirements are completed.

You can install Unravel on a GCP instance. Unravel is then set up to access private data on behalf of a service account outside the Google Cloud environment. You must create a service account and download its private key as a JSON file. The path of the JSON file must be set in Unravel.

Follow the instructions to install and set up Unravel to receive BigQuery data.

Install and set up Unravel on GCP instance

Do the following to set up Unravel on the GCP instance.

1. Create and configure the GCP instance
  1. On your GCP console, go to the GCE dashboard and click Create Instance.

  2. Select the following options based on Unravel's instance requirements:

    • Base OS

    • Instance type and size

    • Ports

    • Networking

      The instance must be HTTPS and publicly accessible.

    • Firewall rules or policies

      Sample inbound rule

      Type

      Protocol

      Port range

      Source

      All traffic

      All

      All

      For example, 10.10.0.0/16

      SSH

      TCP

      22

      0.0.0.0/0 or trusted public IP for SSH access

      Custom TCP Rule

      TCP

      3000

      Custom TCP Rule

      TCP

      4043

      Sample outbound rule

      Type

      Protocol

      Port range

      Source

      All traffic

      All

      All

      0.0.0.0/0

    Note

    The GCP instance should have all TCP access to the BigQuery cluster (server/parent or worker) nodes. You can grant access by inserting adding firewall rules of the BigQuery server/parent and worker with all TCP, all port range.

    While creating the GCP instance add the Firewall properties, Enable the HTTP and HTTPS traffic Go to Network tab and add Network tags. (This is the firewall rule that is already created.)

    dataproce-https-allow.png
Configure the GCE instance
  1. Disable selinux.

    sudo setenforce Permissive
  2. Edit /etc/selinux/config to ensure the setting persists after reboot and ensure SELINUX=permissive.

    sudo vi /etc/selinux/config
  3. Install libaio.x86_64, lzop.x86_64, and ntp.x86_64.

    sudo yum install -y libaio.x86_64
    sudo yum install -y lzop.x86_64
    sudo yum install -y ntp.x86_64
  4. Start ntpd and check the system time.

    sudo service ntpd start
    sudo ntpq -p
  5. Create a new Unravel user named unravel.

    sudo useradd unravel
2. Download Unravel

Download Unravel onto the VM instance that you have created.

3. Deploy Unravel

Deploy Unravel on the GCP instance that you have created.

4. Run setup

You can run the setup command to install Unravel on GCP. The setup command allows you to do the following:

  • Runs Precheck automatically to detect possible issues that prevent a successful installation. Suggestions are provided to resolve issues. Refer to Precheck filters for the expected value for each filter.

  • Let you run extra parameters to integrate the database of your choice.

    The setup command allows you to use a managed database shipped with Unravel, or an external database. When run without any additional parameters, the Unravel managed PostgreSQL database is used. Otherwise, you can specify one of the following databases in the setup command:

    • MySQL (Unravel managed as well as external MySQL database)

    • MariaDB (Unravel managed as well as external MariaDB database)

    • PostgreSQL (External PostgreSQL)

    Refer to Integrate database for details.

  • Let you specify a separate path for the data directory other than the default path.

    The Unravel data and configurations are located in the data directory. By default, the installer maintains the data directory under <Unravel installation directory>/data. You can also change the data directory's default location by running additional parameters with the setup command.

  • Provides more setup options.

To install Unravel with the setup command, do the following:

  1. Switch to Unravel user.

      su - <unravel user>

    Notice

    The Unravel user who owns the installation directory should run the setup command to install Unravel.

  2. Run setup command:

    Refer to Integrate database topic and complete the pre-requisites before running the setup command with any other database other than Unravel managed PostgreSQL, which is shipped with the product. Extra parameters must be passed with the setup command when you use another database.

    • PostgreSQL

      • Unravel managed PostgreSQL

        <unravel_installation_directory>/unravel/versions/<Unravel version>/setup --enable-bigquery
      • External PostgreSQL

        <unravel_installation_directory>/unravel/versions/<Unravel version>/setup --enable-bigquery --external-database postgresql <HOST> <PORT> <SCHEMA> <USERNAME> <PASSWORD>/
        
        ##The HOST, PORT, SCHEMA, USERNAME, PASSWORD are optional fields and are prompted if missing.
        
        ##For example:
        /opt/unravel/versions/abcd.992/setup --enable-bigquery --external-database postgresql xyz.unraveldata.com 5432 unravel_db_prod unravel unraveldata
        
    • MySQL

      • Unravel managed MySQL

        <unravel_installation_directory>/unravel/versions/<Unravel version>/setup --enable-bigquery --extra /tmp/mysql
      • External MySQL

        <unravel_installation_directory>/unravel/versions/<Unravel version>/setup --enable-bigquery --extra /tmp/<MySQL-directory> --external-database mysql <HOST> <PORT> <SCHEMA> <USERNAME> <PASSWORD>
        
        ##The HOST, PORT, SCHEMA, USERNAME, PASSWORD are optional fields and are prompted if missing.
        
    • MariaDB

      • Unravel managed MariaDB

        <unravel_installation_directory>/unravel/versions/<Unravel version>/setup --enable-bigquery  --extra /tmp/mariadb
      • External MariaDB

        <unravel_installation_directory>unravel/versions/<Unravel version>/setup --enable-bigquery  --extra /tmp/<MariaDB-directory> --external-database mariadb <HOST> <PORT> <SCHEMA> <USERNAME> <PASSWORD>
        
        ##The HOST, PORT, SCHEMA, USERNAME, PASSWORD are optional fields and are prompted if missing.
        

    Precheck is automatically run when you run the setup command. Refer to Precheck filters for the expected value for each filter. Also, refer to the Precheck sample.

    Tip

    Optionally, if you want to provide a different data directory, you can pass an extra parameter (--data-directory) with the setup command as shown below:

    <unravel_installation_directory>/unravel/versions/<Unravel version>/setup --enable-bigquery --data-directory /the/data/directory

    Similarly, you can configure separate directories for other unravel directories. Contact support for assistance.

    Note

    Refer to setup Options for all the additional parameters that can be run with the setup command

  3. Start all the services.

    <unravel_installation_directory>/unravel/manager start 
    
  4. Check the status of services.

    <unravel_installation_directory>/unravel/manager report 
    

    The following service statuses are reported:

    • OK: Service is up and running.

    • Not Monitored: Service is not running. (Has stopped or has failed to start)

    • Initializing: Services are starting up.

    • Does not exist: The process unexpectedly disappeared. Restarts will be attempted 10 times.

    You can also get the status and information for a specific service. Run the manager report command as follows:

    <unravel_installation_directory>/unravel/manager report <service> 
    ## For example: /opt/unravel/manager report auto_action
    

The Precheck output displays the issues that prevent a successful installation and also provides suggestions to resolve them. You must resolve each of the issues before proceeding. See Precheck filters.

After the prechecks are resolved, you must re-login or reload the shell to execute the setup command again.

Note

In certain situations, you can skip the precheck using the setup --skip-precheck command

For example:

/opt/unravel/versions/<Unravel version>/setup --skip-precheck

You can also skip the checks that you know can fail. For example, if you want to skip the Check limits option and the Disk freespace option, pick the command within the parenthesis corresponding to these failed options and run the setup command as follows:

setup --filter-precheck ~check_limits,~check_freespace 

Tip

Run --help with the setup command and any combination of the setup command for complete usage details.

<unravel_installation_directory>/unravel/versions/<Unravel version>/setup --help
/opt/unravel/versions/abcd.1004/setup 
2021-04-05 15:51:30 Sending logs to: /tmp/unravel-setup-20210405-155130.log
2021-04-05 15:51:30 Running preinstallation check...
2021-04-05 15:51:31 Gathering information ................. Ok
2021-04-05 15:51:51 Running checks .................. Ok
--------------------------------------------------------------------------------
system
 Check limits        : PASSED
 Clock sync          : PASSED
 CPU requirement     : PASSED, Available cores: 8 cores
 Disk access         : PASSED, /opt/unravel/versions/develop.1004/healthcheck/healthcheck/plugins/system is writable
 Disk freespace      : PASSED, 229 GB of free disk space is available for precheck dir.
 Kerberos tools      : PASSED
 Memory requirement  : PASSED, Available memory: 79 GB
 Network ports       : PASSED
 OS libraries        : PASSED
 OS release          : PASSED, OS release version: centos 7.6
 OS settings         : PASSED
 SELinux             : PASSED
--------------------------------------------------------------------------------
Healthcheck report bundle: /tmp/healthcheck-20210405155130-xyz.unraveldata.com.tar.gz
2021-04-05 15:51:53 Prepare to install with: /opt/unravel/versions/abcd.1004/installer/installer/../installer/conf/presets/default.yaml
2021-04-05 15:51:57 Sending logs to: /opt/unravel/logs/setup.log
2021-04-05 15:51:57 Instantiating templates ................................................................................................................................................................................................................................ Ok
2021-04-05 15:52:05 Creating parcels .................................... Ok
2021-04-05 15:52:20 Installing sensors file ............................ Ok
2021-04-05 15:52:20 Installing pgsql connector ... Ok
2021-04-05 15:52:22 Starting service monitor ... Ok
2021-04-05 15:52:27 Request start for elasticsearch_1 .... Ok
2021-04-05 15:52:27 Waiting for elasticsearch_1 for 120 sec ......... Ok
2021-04-05 15:52:35 Request start for zookeeper .... Ok
2021-04-05 15:52:35 Request start for kafka .... Ok
2021-04-05 15:52:35 Waiting for kafka for 120 sec ...... Ok
2021-04-05 15:52:37 Waiting for kafka to be alive for 120 sec ..... Ok
2021-04-05 15:52:42 Initializing pgsql ... Ok
2021-04-05 15:52:46 Request start for pgsql .... Ok
2021-04-05 15:52:46 Waiting for pgsql for 120 sec ..... Ok
2021-04-05 15:52:47 Creating database schema ................. Ok
2021-04-05 15:52:50 Generating hashes .... Ok
2021-04-05 15:52:52 Loading elasticsearch templates ............ Ok
2021-04-05 15:52:55 Creating kafka topics .................... Ok
2021-04-05 15:53:36 Creating schema objects ....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... Ok
2021-04-05 15:54:03 Request stop ....................................................... Ok
2021-04-05 15:54:16 Done
[unravel@xyz ~]$
Enable Transport Layer Security (TLS) for Unravel UI

Refer to Enabling Transport Layer Security (TLS) for Unravel UI.

Set up Unravel to receive BigQuery data

Do the following to setup Unravel to receive BigQuery data:

A project ID is a unique string used to differentiate your project from all others in Google Cloud. You cannot edit a Project ID after it is generated.

You can either create a project on the GCP account and get the Project ID or get the Project ID of an existing project and keep it handy. The Project ID must be set in Unravel later using the manager utility. See Add BigQuery details in Unravel.

Create a role for an Unravel project with the required permission for Unravel monitoring. This role is assigned to the Unravel service account that is used to authenticate Unravel applications.

  1. On the GCP Console, go to the Roles page.

  2. Using the drop-down list at the top of the page, select the Unravel project in which you want to create a role.

  3. Click Create Role and enter a NameTitle, and Descriptionfor the role. Preferably keep the role name as unravel for easy identification. The description is optional, which can be maintained as Unravel monitoring.

    Select Role launch stage as General Availability.

    Caution

    The role name cannot be changed after the role is created.

  4. Click Add Permissions.

  5. In the Add Permissions dialog box, filter and select the following permissions for the role, and then click Add.

    Permission

    Description

    bigquery.jobs.get

    Gets the job details.

    bigquery.tables.get

    Gets the table details.

    pubsub.subscriptions.consume

    Consumes the message from google pub-sub topic.

    resourcemanager.projects.get

    Gets the project details. For this permission, you must enable the Resource Manager API.

    bigquery.datasets.get

    bigquery.routines.get

    bigquery.routines.list

    bigquery.tables.getData

    bigquery.tables.list

    Permissions to get the tables and partitions metadata. These permissions are required for the Data page.

    bigquery.jobs.create

    Permissions to execute queries on BigQuery to fetch the metadata about the tables and partitions. These permissions are required for the Data page.

  6. Click Create. The role is created.

A service account can be attached to a VM so that applications running on that VM can authenticate as the service account. To set up Unravel for BigQuery monitoring, you must create a service account and then assign the role created into this service account.

  1. On the GCP Console, go to the Create service account page.

  2. Select the Unravel project.

  3. Under Service Account Details, enter a service account name to display on the GCP Console. Preferably keep the service account name as unravel-service-account.

    Note

    A service account ID is generated based on this name. Edit the ID if required. You cannot change the ID later.

    Optional: Enter a description of the service account. Preferably keep the description as Unravel monitoring.

  4. Click CREATE AND CONTINUE.

  5. Under Grant this service account access to this project, select the role, which was created in Step 1 with the associated permissions.

  6. Click Done. The service account is created, and the role is now attached to the service account.

Each service account is associated with a public/private RSA key pair. The Service Account Credentials API uses this internal key pair to create short-lived service account credentials and to sign blobs and JSON Web Tokens (JWTs). This key pair is known as the Google-managed key pair. The Google-managed key pairs are used to authenticate calls to APIs.

  1. On the GCP Console, go to the Service accounts page.

  2. Select the Unravel project.

  3. Click the email address of the service account which you have created for Unravel.

  4. Click the Keys tab.

  5. Click the Add key drop-down menu, then select Create new key.

  6. Select  JSON as the Key type and click Create. A service account key file is downloaded. You can download this only once. Store the key file securely and transfer this file to Unravel node.

    Following is a sample of the downloaded key file:

    {
      "type": "service_account",
      "project_id": "unravel-test-337406",
      "private_key_id": "b111a5ad112bf0000d5c03b28b9a1d1f8ac31af",
      "private_key": "-----BEGIN PRIVATE KEY-----\n<privatekey>\n-----END PRIVATE KEY-----\n",
      "client_email": "unravel-service-account@unravel-test-337406.iam.gserviceaccount.com",
      "client_id": "112180000016964232364",
      "auth_uri": "https://accounts.google.com/o/oauth2/auth",
      "token_uri": "https://accounts.google.com/o/oauth2/token",
      "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
      "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account-email"
    }

    The file path of the service account key must be set in Unravel using the manager utility. See Add BigQuery details in Unravel.

When queries are run, logs are generated in BigQuery. The logs can be pushed to Unravel via Pub/Sub topics. To route logs to Unravel, a Pub/Sub topic must be created.

  1. On the GCP console, go to thePub/Sub topics page.

    Note

    The logs are pushed to Unravel via Pub/Sub topics.

  2. Click Create topic.

  3. In the Topic ID field, enter an ID for your topic and click Create Topic. Preferably specify unravel-bigquery as the topic ID. The topic is now listed in the list of topics.

Sinks control how Cloud Logging routes log. Using sinks, you can route the logs to supported destinations.

  1. On the GCP console, go to the Logs Router page.

  2. Click Create Sink and provide a name and description for the sink. Preferably name the Sink as unravel-sink.

  3. Under Sink Destination, from the Sink Service dropdown, select the Cloud Pub/Sub topic.

  4. From Select a Cloud Pub/Sub topic, select the Cloud Pub/Sub topic that you had created for Unravel project in Step 5.

  5. Under Choose logs to include in the Sink, add a filter to determine the logs that must be included in the log routing sink. For example

    resource.type="bigquery_resource" AND (protoPayload.methodName="jobservice.insert" OR protoPayload.methodName="jobservice.jobcompleted")

    In this example:

    • bigquery_resource indicates the BigQuery logs.

    • jobservice.insert indicates the jobs of running type.

    • jobservice.jobcompleted indicates the jobs of completed type.

  6. Click Create Sink.

A subscription is created to subscribe to associated topics and set the process to send the logs to unravel. Do the following to create a subscription.

  1. On the GCP console, go to the Pub/Sub topics page.

  2. Select the topic that was created in Step 5.

  3. Scroll down to the Subscriptions tab and click Create Subscription > Create Subscription.

  4. In the Subscription ID text box, provide a name for the subscription. Preferably keep the Subscription ID as unravel-bigquery-sub.

    Note

    You must set the subscription ID in Unravel using the manager utility. See Add BigQuery details in Unravel.

  5. Under the Delivery type, select Push, and in the Endpoint URL text box, provide the Log Receiver (LR) endpoint URL for PUSH messages. LR server receives the logs from google Pub/Sub and is the entry point to unravel. The criterion for the LR endpoint URLs are as follows:

    You must specify the endpoint URL in the following format:

    https://<unravel-hostname>:4043/logs/bigquery/<gcp-project-id>/bigquery/bigquery

    For example:

    https://playground-bq-4730.unraveldata.com:4043/logs/bigquery/unravelsaas-329506/bigquery/bigquery
  6. Provide the expiry time for the subscription.

  7. Optionally, you can filter the Push messages via Message ordering, Dead lettering , and Retry Policy options.

  8. Click Create.

The following details of BigQuery configurations must be specified in Unravel through the Unravel manager utility.

  • Project ID

  • File path to Service key file or Credentials file

  • Subscription ID

To add these details, do the following:

  1. Stop Unravel.

    <Unravel installation directory>/unravel/manager stop
  2. Run the following command from the Unravel installation directory.

    <Unravel installation directory>/unravel/manager config bigquery add
  3. Enter the following details when prompted:

    For example:

    Project id: unravel-test-331210
    Subscription id: bigquery-sub
    Credentials file: /tmp/unravel-test-331210-eoadad3feeea.json
  4. Optional: Using the manager utility, set the frequency to poll Project data such as project name, state, etc. By default, it is one day that is 1440 minutes.

    <Unravel installation directory>/unravel/manager config properties set com.unraveldata.bigquery.project.details.poll.delay.mins 720

    Refer BigQuery properties to edit and set other BigQuery properties.

  5. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
  6. Start Unravel.

    <Unravel installation directory>/unravel/manager start
Verify BigQuery integration

To verify BigQuery integration with Unravel, do the following:

  1. On the GCP console, run test queries from the project integrated with Unravel.

  2. Using a supported web browser, navigate to Unravel URL (For example, https://<unravel-host>:3000) and log onto Unravel UI using the credentials.

  3. Navigate to Jobs tab > Applications , and under Application type, select BigQuery. The details of the test queries run from the GCP console will be listed under the All tab.

    verify-bigquery-install.png
Remove BigQuery project from Unravel
  1. Stop Unravel.

    <Unravel installation directory>/unravel/manager stop
  2. Run the following command from the Unravel installation directory.

    <Unravel installation directory>/unravel/manager config bigquery remove <project-ID>
  3. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
  4. Start Unravel.

    <Unravel installation directory>/unravel/manager start