YOUR OWN DATA ARCHITECTURE

First-Party Data & Measurement Architecture

sGTM, Conversion API, BigQuery/Snowflake data lake, Consent Mode v2 + TCF 2.2, identity resolution and reverse ETL — the data infrastructure of brands that win in the post-third-party-cookie world, built with first-principles engineering discipline.

This isn't the era when the pixel died; it's the era when data ownership became mandatory — infrastructure is an engineering discipline, not plug-and-play SaaS.

With Consent Mode v2, iOS 17 ATT, Chrome cookie changes and TCF 2.2, the signal reaching ad platforms has eroded by 40-60% on average. Most brands responded by stitching together parallel data lakes across multiple SaaS tools — each with its own ID, its own consent interpretation, its own event schema. Roibase's first-party data operation removes this fragmentation through six principles; every principle is an engineering standard, not a SaaS product.

Roibase perspective

METHODOLOGY

AUDIT to DESIGN to DEPLOY to VALIDATE to GOVERN to HANDOFF — engineering discipline

Data architecture is not a tag management project; it's a long-lived platform. A six-stage process makes every decision written, testable, and transferable.

01

01

AUDIT

Audit of current client-side GTM, GA4, pixels, CMP, consent implementation, data flow and cost visibility; signal loss, consent violations and data duplication are quantified.

02

02

DESIGN

Event taxonomy, identity strategy, consent policy, warehouse architecture and data contracts are designed; stakeholder approval (legal, IT, marketing, data) is secured.

03

03

DEPLOY

sGTM container, CAPI endpoints, Consent Mode v2 configuration, warehouse streaming and dbt models go live; blue/green deployment reduces risk.

04

04

VALIDATE

Shadow mode + dual tracking run old and new architecture in parallel; no cutover until event parity reaches 99%+; QA checklist covers 120+ items.

05

05

GOVERN

Schema registry, PII tagging, retention, RBAC, audit log and compliance reports go live; a data governance council convenes on a monthly cadence.

06

06

HANDOFF

Three weeks of hands-on training for your team + runbook + 6 months of async support; critical alert rotation and SLA contract handed over in writing.

— COMPARISON

In-house vs SaaS-dependent agency vs Roibase data engineering

The concrete difference three approaches make on data ownership, consent compliance, engineering depth and total cost.

DimensionIn-house minimalSaaS-dependent agencyRoibase engineering
Data ownershipFragmented (every tool its own DB)With the SaaS vendorIn your own warehouse
sGTM + CAPIPartial (client-only)None or vendor-managedOn your own infrastructure, full ownership
Consent Mode v2 + TCF 2.2Basic integrationPre-set CMP, no customizationWritten policy + legal review + tests
Identity resolutionNone or email-onlyVendor black-boxDeterministic + probabilistic, open model
PII governance + audit logAd-hocContractual, not operationalRunbook + monthly compliance report
Data contracts + schema registryNoneBound to SaaS schemaVersioned, testable, owned
Reverse ETL + activationManual CSVSaaS-lockedWarehouse-native, free choice
Total annual cost50-120k€ (fragmented SaaS)120-250k€ (agency + licenses)80-180k€ (setup + warehouse)

PROOF

Outcomes, measured

+45%
Signal recovery

Recovering unattributed conversions after iOS 14+/ATT through sGTM + CAPI.

94%
Consent compliance rate

Acceptable consent state distribution after TCF 2.2 + Consent Mode v2.

12
Tool consolidation

Typical number of separate data/analytics SaaS tools consolidated per customer.

€0
Monthly data license cost

In your own warehouse — only query + storage cost; no SaaS per-seat fee.

8
Weeks to deploy

Typical mid-market timeline from audit to live shadow mode.

99.8%
Event delivery rate

Average event delivery success after sGTM + CAPI dual-path.

WHAT WE DO

Engagement scope

Every offering is an outcome-based work package. Roibase blends strategy and execution inside a single team — no hand-offs.

01 / 10

Server-side GTM (sGTM)

Your own sGTM container on Google Cloud Run / AWS Fargate: data ownership is yours, no vendor lock-in, client load drops; PII redaction happens on the server.

02 / 10

Consent Mode v2 + TCF 2.2

IAB TCF 2.2 compliant CMP integration, dynamic propagation of ad_user_data + ad_personalization signals based on consent state; KVKK/GDPR 'legal basis' separation backed by written policy.

03 / 10

Conversion API (CAPI)

Server-side conversion events for Meta, Google, TikTok, Pinterest; hashed PII + event deduplication; 30-50% signal recovery and iOS 14+/ATT compliance.

04 / 10

BigQuery / Snowflake data lake

Raw event streaming + dbt models + semantic layer + Looker Studio/Metabase/Looker visualization; partition + clustering + cost optimization included.

05 / 10

Identity resolution

Deterministic (login, email hash) + probabilistic (device fingerprint, household) identity graph; a single user identity for cross-device journeys and cross-channel attribution.

06 / 10

CDP readiness

Segment / RudderStack / mParticle integration, or warehouse-native CDP (Census, Hightouch) reverse ETL pipelines; CDP selection made through independent evaluation.

07 / 10

Reverse ETL & activation

Automated push of computed segments (churn risk, LTV tier, product affinity) to Meta Custom Audience, Google Customer Match, Klaviyo, HubSpot, Braze.

08 / 10

Customer Match rebuild

Lookalike + retargeting rebuilt with hashed PII + CAPI; infrastructure that preserves ad platform performance in a pixel-less world.

09 / 10

Schema registry + PII governance

Event schema is versioned and testable; PII fields are tagged, retention + masking policy enforced; schema drift alerts for data quality monitoring.

10 / 10

Audit log + access monitoring

Every data access is logged — who, when, why; role-based access control (RBAC), data contracts, and automated monthly compliance reports.

— BENEFIT

The tangible, measurable return on data ownership

First-party data architecture isn't just compliance; it's direct leverage on ad performance, customer understanding and team velocity.

+45% signal

Ad signal recovery

30-50% signal recovery with Meta/Google/TikTok CAPI; ad platforms learn faster and optimize better.

-52% SaaS spend

Tool costs drop

Fragmented SaaS stack is consolidated into a single warehouse + dbt layer; annual license spend falls 40-60%.

+38% decision speed

Your team moves faster

A self-serve semantic layer lets business units answer their own questions; the data team shifts from bottleneck to enabler.

100% audit-ready

Consent compliance, written

TCF 2.2 + Consent Mode v2 + KVKK policy is audited and testable; the evidence file is ready for regulators.

+28% attribution accuracy

Cross-channel journey visible

Identity resolution reveals user journeys across devices and channels; attribution models and cohort analysis run on unified data.

Runbook + RACI

Data governance is sustainable

Schema registry, PII tagging, retention, RBAC, audit log — handed over to your team with a runbook and monthly compliance report.

DELIVERABLES

Concrete, written deliverables for every first-party project

Architecture, code, configuration, documentation and training — every artifact is versioned and handed over to your team.

  • Signal audit report

    Quantitative assessment of existing signal loss, consent violations and tool duplication, 40-60 pages.

  • Event taxonomy & data contracts

    Every event's name, properties, owner, schema version and backward compatibility rules.

  • sGTM container setup

    Live sGTM on Google Cloud Run / AWS Fargate, blue/green deployment + CI/CD pipeline + rollback plan.

  • CAPI integrations

    Server-side conversion events for Meta, Google, TikTok, Pinterest; event deduplication + hashed PII + error handling.

  • Consent Mode v2 + CMP policy

    IAB TCF 2.2 compliant CMP configuration, dynamic ad_user_data/ad_personalization signals, written consent policy + legal review.

  • BigQuery/Snowflake warehouse

    Raw event streaming pipeline, partition + clustering, cost optimization, monitoring + alerting.

  • dbt models + semantic layer

    Staging to intermediate to marts layers, dbt tests, exposures, lineage graph + documentation site.

  • Identity resolution pipeline

    Deterministic + probabilistic matching rules, household detection, cross-device journey table.

  • Reverse ETL pipelines

    Segment syncs to Meta CA, Google CM, Klaviyo, HubSpot, Braze via Census/Hightouch; schedule + monitoring.

  • Schema registry & PII governance

    Versioned schema records, PII tagging, retention + masking policy, schema drift alerts.

  • Audit log + compliance report

    RBAC configuration, data access log, automated monthly compliance report (KVKK/GDPR + ad platform policy).

  • Runbook + 3-week training

    Operational runbook, on-call rotation, SLA contract + 3 weeks of hands-on training for your team.

— SCOPE

What we do, what we don't — clear boundaries

First-party architecture is an engineering discipline; defining scope precisely prevents surprises and downstream billing.

We do

  • Signal audit + consent health assessment
  • Event taxonomy + data contracts design
  • sGTM container setup + CI/CD + monitoring
  • Meta/Google/TikTok/Pinterest CAPI integrations
  • Consent Mode v2 + TCF 2.2 + CMP configuration
  • BigQuery/Snowflake warehouse + streaming pipeline
  • dbt models + semantic layer + tests
  • Identity resolution (deterministic + probabilistic)
  • Reverse ETL pipelines (Census/Hightouch)
  • Schema registry + PII governance + audit log
  • Legal/compliance review coordination
  • Runbook + 3-week hands-on training

We don't

  • Legal counsel (coordinated via partner lawyer + policy review)
  • CDP license resale (we give vendor-agnostic recommendations, no commission)
  • Maintaining fragmented SaaS stacks (consolidation is recommended)
  • Raw analytics agency retainers (engineering sprints, not packages)
  • Guaranteed 'pre-pixel' signal recovery (we give a realistic range)
  • Warehouse licenses / cloud invoices (stay on the customer's account)
  • Ad account management (separate scope with PPC/Growth teams)
  • Plug-and-play SaaS deployment (every customer gets a custom architecture)

HOW WE WORK

First 8-week rollout to 6-month operation — who does what and when, in writing

01

Weeks 1-2: audit + discovery

Current GTM/GA4/CMP/pixel audit, consent health check, stakeholder interviews, architecture requirements document.

02

Weeks 3-4: design + data contracts

Event taxonomy, identity strategy, warehouse schema, consent policy, data contracts — approved by legal + IT + marketing.

03

Weeks 5-6: sGTM + CAPI deploy

Cloud Run/Fargate container goes live; Meta/Google/TikTok CAPI integration; shadow mode starts.

04

Weeks 7-8: warehouse + dbt

BigQuery/Snowflake streaming pipeline, dbt staging + intermediate + marts, first version of semantic layer.

05

Weeks 9-10: validate + cutover

Event parity testing, QA checklist, blue/green cutover; decommission plan for the old architecture.

06

Weeks 11-12: govern + handoff

Schema registry, PII tagging, audit log, RBAC; hands-on training begins, runbook delivered.

07

Months 4-5: activation + optimization

Reverse ETL pipelines, first segment activations, MMM/attribution data preparation, cost optimization.

08

Month 6+: steady state + audit

Monthly compliance report, quarterly data governance council, schema drift monitoring, SLA + on-call rotation.

— TOOLKIT

The tools we use — vendor-agnostic but decisive choices

We pick what fits each customer; we protect independence by taking no commissions.

SERVER-SIDE TRACKING

Google Tag Manager ServerStape.ioGoogle Cloud RunAWS FargateMeta Conversion APIGoogle Ads Enhanced ConversionsTikTok Events APIPinterest CAPI

CMP & CONSENT

OneTrustCookiebotDidomiUsercentricsGoogle Consent Mode v2IAB TCF 2.2

WAREHOUSE & CDP

BigQuerySnowflakeRedshiftdbt Core/CloudSegmentRudderStackmParticleAmplitude

REVERSE ETL & ACTIVATION

CensusHightouchPolytomicFivetranAirbyteStitchMeta Custom Audience APIGoogle Customer Match API

QUESTIONS

Frequently asked

Three concrete benefits: (1) 30-50% signal recovery by bypassing ad-blockers + ITP, (2) Data ownership — PII redaction happens on the server, (3) Faster page loads — client-side script load drops. On top of that, vendor lock-in breaks; all tag logic lives in your cloud.

— GLOSSARY

First-party data engineering terminology

Twelve critical terms that give your team and stakeholders a shared language.

01
sGTM
Server-side Google Tag Manager — a proxy that takes the browser GTM payload, sanitises and enriches it, then fans out to multiple destinations (GA4, Meta CAPI, TikTok, etc.). Extends cookie lifetime, resists ad-blockers and is the backbone of server-side conversion APIs.
CAPIConsent Mode v2
02
CAPI
Meta's server-to-server event API running in parallel to the Pixel. Recovers the 20-40% of conversion signal lost in the browser due to ITP and ad-blockers; deduplication requires every event to carry an event_id and matching timestamp. A foundation of any modern paid-social stack.
sGTMCustomer Match
03
Consent Mode v2
Google's TCF 2.2 compliant consent signal mechanism; ad_user_data + ad_personalization states.
TCF 2.2PII
04
TCF 2.2
The IAB Europe Transparency & Consent Framework version mandatory since 2024. Standardises the consent signal between publisher, vendor and user; CMPs (OneTrust, Cookiebot, Didomi) deliver mandatory compliance together with Google Consent Mode v2.
Consent Mode v2
05
Identity resolution
Linking user activity across devices and channels to a single identity; deterministic + probabilistic.
CDPCustomer Match
06
CDP
Customer Data Platform; the system that unifies user profiles and exposes them to activation channels (Segment, mParticle, warehouse-native).
Reverse ETLData warehouse
07
Reverse ETL
Pushing data from the warehouse to operational tools (Meta, Google, Klaviyo); Census, Hightouch are typical vendors.
CDPData warehouse
08
Customer Match
Using a hashed first-party list (email, phone, mailing address) as a targeting/exclusion audience across Google Search, YouTube and Display. The base for lookalike seeds and win-back; the minimum match rate to be useful is typically 30%+.
CAPIPII
09
Data warehouse
The cloud data store where raw and modelled event data live (BigQuery, Snowflake, Redshift, Databricks).
Event schemaData governance
10
Event schema
Written, versioned definition of event names, properties, data types and owners; stored in the schema registry.
Data governance
11
PII
Personally Identifiable Information; data that identifies a person (email, phone, IP, device ID). Managed under tagging + retention.
Data governanceConsent Mode v2
12
Data governance
The combined disciplines of data quality, access, stewardship and compliance; RBAC + audit log + data contracts are standard.
PIIEvent schema
13
GA4 Measurement Protocol
A server-to-server protocol that sends events directly to GA4 over HTTP. Generates conversion signal from environments without a web pixel (CRM, IoT, app server); authenticates with api_secret + measurement_id and is wired to respect Consent Mode.
14
Enhanced Conversions
A measurement layer in Google Ads that ties a conversion to a user via hashed first-party data (email, phone). Recovers 3-15% of attribution lost to ITP and cookie decay; ships in web and lead-form variants.
15
Offline Conversions
The process of feeding back conversions that happen in CRM (lead-to-sale, call closure, store visit) to the ad platform via the click ID (gclid/wbraid/fbclid). The most reliable way to feed tROAS with real revenue.
16
First-party Data
Data the brand collects directly from its own properties (web, app, CRM, call centre, email, membership) under user consent. The most defensible fuel for performance marketing post-third-party-cookie; hashed and activated into ad platforms.
17
Data Clean Room
A secure compute environment where two parties (e.g. brand + media platform) can match and aggregate without exposing each other's raw PII. Google Ads Data Hub, Amazon AMC, Snowflake/Databricks clean rooms — used for overlap analysis, attribution and audience building.
18
Identity Graph
A relational graph that links one person across their devices, email, phone, payment identifier and hashed IDs. Foundation for cross-device attribution, retention modelling and LAL seed quality — the heart of any CDP.
19
First-party Cookies
Cookies set by the site's own domain and only sent on its own page requests. After third-party cookies were blocked, ITP further capped this category — server-side cookie setting + 1y+ rotation policy is now essential.
20
Server-side Events
Conversion events sent to the ad platform via API from your own server (sGTM, own backend) rather than from the browser. Immune to ad-blocker and browser caps; works with specs like CAPI (Meta), GA4 MP, TikTok Events API.
21
Hashed PII
A personally identifiable value (email, phone, name) frozen via a one-way cryptographic function (usually SHA-256). Mandatory for matching, custom-audience upload and Enhanced Conversions on ad platforms — a privacy and compliance requirement.
22
Privacy Sandbox
Google's suite of Chrome APIs designed to enable ad measurement, retargeting and fraud detection without third-party cookies: Topics, Protected Audience (FLEDGE), Attribution Reporting. The Google side of the cookieless future.

— DECISION TREE

Is a first-party data operation right for you?

Answer 4 questions Yes/No; get a clear recommendation.

01 / 04

Is your monthly ad budget above 30k USD?

The threshold for signal recovery to be economically meaningful.

— LET'S BEGIN

How much do you trust your pixels?

In a 2-hour signal audit we surface lost conversions, consent issues and warehouse opportunities.