1. Overview: What You Are Building
A real-time ETL dashboard continuously shows data from streaming or frequently updated sources.
Using BigQuery as the analytical engine and Grafana as the visualization layer, you can build dashboards with:
Live ingestion (seconds or sub-second)
Transformation (cleaning, deduplication, aggregation)
Visualization (time-series charts, KPIs, alerts)
2. High-Level Architecture
Source Systems
(Apps, APIs, IoT, Events)
|
v
Real-Time Ingestion
(Pub/Sub, Kafka, DataFlow, Fivetran)
|
v
BigQuery
- Streaming tables
- Materialized views
- Partitioned tables
|
v
Grafana
- BigQuery connector
- Real-time panels
- Alerts
3. Step 1 — Set Up Real-Time Data Ingestion
You need to ingest data into BigQuery with low latency.
Option A: Google Pub/Sub (best for streaming)
Flow:
Events → Pub/Sub topic
Pub/Sub → Dataflow Streaming job
Dataflow → BigQuery Streaming Inserts
Option B: Direct Streaming Inserts to BigQuery
Send JSON records directly via API:
BigQuery insertAll API
Latency < 1–2 seconds
Option C: Managed Connectors
Fivetran real-time connectors
Stitch / Airbyte
Google Datastream (for CDC / databases)
4. Step 2 — Build Your ETL Logic
Your ETL must run continuously. You can choose among:
4.1 Real-Time Transformation with Dataflow (Recommended)
Dataflow (Apache Beam) lets you:
Extract from Pub/Sub
Transform (clean, validate, enrich)
Load into BigQuery
Example operations:
Parse JSON
Add timestamps
Remove duplicates
Compute fields (e.g., total_price = qty × price)
4.2 Using BigQuery Itself for Transformations
BigQuery supports:
✔ Materialized Views
Auto-refresh every 30 seconds or faster depending on setup
Great for:
aggregations
time-series windows
rollups (SUM, COUNT, AVG)
✔ Scheduled Queries (for near real-time)
Minimum frequency: every 1 minute
Good for:
incremental ETL
complex joins
daily/weekly aggregations
✔ Looker / dbt for Data Modeling
For clean semantic layers.
5. Step 3 — Model Data in BigQuery
Recommended table architecture:
A. Raw Streaming Table
raw_events
- event_id
- event_timestamp
- source
- data (JSON or flattened)
- ingestion_time (partition)
B. Cleaned Table (Transformed)
events_clean
- event_id
- ts (timestamp)
- user_id
- value
- category
- status
C. Aggregated or Derived Tables
Materialized view example:
CREATE MATERIALIZED VIEW events_mv AS
SELECT
TIMESTAMP_TRUNC(ts, MINUTE) AS minute_bucket,
COUNT(*) AS total_events,
AVG(value) AS avg_value
FROM events_clean
GROUP BY minute_bucket;
6. Step 4 — Connect Grafana to BigQuery
Grafana provides a native Google BigQuery data source.
Steps:
In Grafana → Configuration → Data Sources.
Add BigQuery.
Authenticate via:
Service Account JSON
Google OAuth
Select:
Project ID
Dataset
Billing Project ID
Save & test.
Now Grafana can run SQL queries directly on BigQuery.
7. Step 5 — Build Real-Time Dashboards
Panel Types
For real-time ETL visibility:
Time-series (events/minute, record counts)
Gauge (pipeline health, error rate)
Stat cards (latest value)
Logs views (raw events)
Tables (aggregated ETL metrics)
Example Query for Time-Series
SELECT
TIMESTAMP_TRUNC(ts, MINUTE) AS minute,
COUNT(*) AS events_count
FROM
`project.dataset.events_clean`
WHERE
ts > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
GROUP BY minute
ORDER BY minute;
Refresh Interval
Grafana can refresh panels every:
5s
10s
30s
1m
Set per-panel or per-dashboard.
8. Step 6 — Add Alerts and Monitoring
Grafana allows alerting directly on BigQuery queries.
Alert Examples:
ETL pipeline dropped to 0 events/min
Error rate > 2%
Latency > 5 seconds
Missing partitions in BigQuery
Grafana supports:
Slack
PagerDuty
Webhooks
9. Optional Enhancements
✔ Use a Lakehouse Layer (GCS + BigQuery External Tables)
Store raw data in Cloud Storage and query through BigQuery.
✔ Integrate dbt for transformation modeling
Better versioning, testing, documentation.
✔ Use LookML or a semantic layer
Consistent metrics across teams.
✔ Add ML/AI in BigQuery
Use BigQuery ML to predict ETL anomalies:
Outlier detection
Pipeline failure prediction
10. Example Mini Architecture
Streaming Events System
App sends events → Pub/Sub
Dataflow cleans + enriches → BigQuery
Materialized Views → Real-time aggregates
Grafana uses SQL queries → Displays KPIs
KPIs to track:
Total events per minute
Failed ETL events
Latency of ingestion
Partition backlog
Transformation error counts
Final Summary
To build a real-time ETL dashboard using Grafana + BigQuery:
Ingest data using Pub/Sub, Dataflow, or streaming inserts
Transform data using Dataflow or BigQuery SQL (materialized views / scheduled queries)
Store cleaned results in partitioned BigQuery tables
Connect Grafana to BigQuery using a native connector
Build dashboards with live refresh intervals
Add alerts for ETL health and pipeline performance
Optionally layer dbt, GCS, or ML for enhanced capabilities
Learn GCP Training in Hyderabad
Read More
Using Redis with GCP for Real-Time Leaderboards
Processing Clickstream Data for Personalization in Real-Time
Real-Time Social Media Sentiment Analysis with Dataflow and BigQuery ML
Building an IoT Event Hub on Google Cloud
Visit Our Quality Thought Training Institute in Hyderabad
Subscribe by Email
Follow Updates Articles from This Blog via Email
No Comments