Friday, December 12, 2025

thumbnail

Building a Real-Time ETL Dashboard with Grafana and BigQuery

 1. Overview: What You Are Building

A real-time ETL dashboard continuously shows data from streaming or frequently updated sources.

Using BigQuery as the analytical engine and Grafana as the visualization layer, you can build dashboards with:

Live ingestion (seconds or sub-second)

Transformation (cleaning, deduplication, aggregation)

Visualization (time-series charts, KPIs, alerts)

2. High-Level Architecture

Source Systems

(Apps, APIs, IoT, Events)

|

v

Real-Time Ingestion

(Pub/Sub, Kafka, DataFlow, Fivetran)

|

v

BigQuery

- Streaming tables

- Materialized views

- Partitioned tables

|

v

Grafana

- BigQuery connector

- Real-time panels

- Alerts

3. Step 1 Set Up Real-Time Data Ingestion

You need to ingest data into BigQuery with low latency.

Option A: Google Pub/Sub (best for streaming)

Flow:

Events Pub/Sub topic

Pub/Sub Dataflow Streaming job

Dataflow BigQuery Streaming Inserts

Option B: Direct Streaming Inserts to BigQuery

Send JSON records directly via API:

BigQuery insertAll API

Latency < 12 seconds

Option C: Managed Connectors

Fivetran real-time connectors

Stitch / Airbyte

Google Datastream (for CDC / databases)

4. Step 2 Build Your ETL Logic

Your ETL must run continuously. You can choose among:

4.1 Real-Time Transformation with Dataflow (Recommended)

Dataflow (Apache Beam) lets you:

Extract from Pub/Sub

Transform (clean, validate, enrich)

Load into BigQuery

Example operations:

Parse JSON

Add timestamps

Remove duplicates

Compute fields (e.g., total_price = qty × price)

4.2 Using BigQuery Itself for Transformations

BigQuery supports:

Materialized Views

Auto-refresh every 30 seconds or faster depending on setup

Great for:

aggregations

time-series windows

rollups (SUM, COUNT, AVG)

Scheduled Queries (for near real-time)

Minimum frequency: every 1 minute

Good for:

incremental ETL

complex joins

daily/weekly aggregations

Looker / dbt for Data Modeling

For clean semantic layers.

5. Step 3 Model Data in BigQuery

Recommended table architecture:

A. Raw Streaming Table

raw_events

- event_id

- event_timestamp

- source

- data (JSON or flattened)

- ingestion_time (partition)

B. Cleaned Table (Transformed)

events_clean

- event_id

- ts (timestamp)

- user_id

- value

- category

- status

C. Aggregated or Derived Tables

Materialized view example:

CREATE MATERIALIZED VIEW events_mv AS

SELECT

TIMESTAMP_TRUNC(ts, MINUTE) AS minute_bucket,

COUNT(*) AS total_events,

AVG(value) AS avg_value

FROM events_clean

GROUP BY minute_bucket;

6. Step 4 Connect Grafana to BigQuery

Grafana provides a native Google BigQuery data source.

Steps:

In Grafana Configuration Data Sources.

Add BigQuery.

Authenticate via:

Service Account JSON

Google OAuth

Select:

Project ID

Dataset

Billing Project ID

Save & test.

Now Grafana can run SQL queries directly on BigQuery.

7. Step 5 Build Real-Time Dashboards

Panel Types

For real-time ETL visibility:

Time-series (events/minute, record counts)

Gauge (pipeline health, error rate)

Stat cards (latest value)

Logs views (raw events)

Tables (aggregated ETL metrics)

Example Query for Time-Series

SELECT

TIMESTAMP_TRUNC(ts, MINUTE) AS minute,

COUNT(*) AS events_count

FROM

`project.dataset.events_clean`

WHERE

ts > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)

GROUP BY minute

ORDER BY minute;

Refresh Interval

Grafana can refresh panels every:

5s

10s

30s

1m

Set per-panel or per-dashboard.

8. Step 6 Add Alerts and Monitoring

Grafana allows alerting directly on BigQuery queries.

Alert Examples:

ETL pipeline dropped to 0 events/min

Error rate > 2%

Latency > 5 seconds

Missing partitions in BigQuery

Grafana supports:

Email

Slack

PagerDuty

Webhooks

9. Optional Enhancements

Use a Lakehouse Layer (GCS + BigQuery External Tables)

Store raw data in Cloud Storage and query through BigQuery.

Integrate dbt for transformation modeling

Better versioning, testing, documentation.

Use LookML or a semantic layer

Consistent metrics across teams.

Add ML/AI in BigQuery

Use BigQuery ML to predict ETL anomalies:

Outlier detection

Pipeline failure prediction

10. Example Mini Architecture

Streaming Events System

App sends events Pub/Sub

Dataflow cleans + enriches BigQuery

Materialized Views Real-time aggregates

Grafana uses SQL queries Displays KPIs

KPIs to track:

Total events per minute

Failed ETL events

Latency of ingestion

Partition backlog

Transformation error counts

Final Summary

To build a real-time ETL dashboard using Grafana + BigQuery:

Ingest data using Pub/Sub, Dataflow, or streaming inserts

Transform data using Dataflow or BigQuery SQL (materialized views / scheduled queries)

Store cleaned results in partitioned BigQuery tables

Connect Grafana to BigQuery using a native connector

Build dashboards with live refresh intervals

Add alerts for ETL health and pipeline performance

Optionally layer dbt, GCS, or ML for enhanced capabilities

Learn GCP Training in Hyderabad

Read More

Using Redis with GCP for Real-Time Leaderboards

Processing Clickstream Data for Personalization in Real-Time

Real-Time Social Media Sentiment Analysis with Dataflow and BigQuery ML

Building an IoT Event Hub on Google Cloud

Visit Our Quality Thought Training Institute in Hyderabad

Get Directions 

Subscribe by Email

Follow Updates Articles from This Blog via Email

No Comments

About

Search This Blog

Powered by Blogger.

Blog Archive