YOUR OWN DATA ARCHITECTURE
First-Party Data & Measurement Architecture
sGTM, Conversion API, BigQuery/Snowflake data lake, Consent Mode v2 + TCF 2.2, identity resolution and reverse ETL — the data infrastructure of brands that win in the post-third-party-cookie world, built with first-principles engineering discipline.
This isn't the era when the pixel died; it's the era when data ownership became mandatory — infrastructure is an engineering discipline, not plug-and-play SaaS.
With Consent Mode v2, iOS 17 ATT, Chrome cookie changes and TCF 2.2, the signal reaching ad platforms has eroded by 40-60% on average. Most brands responded by stitching together parallel data lakes across multiple SaaS tools — each with its own ID, its own consent interpretation, its own event schema. Roibase's first-party data operation removes this fragmentation through six principles; every principle is an engineering standard, not a SaaS product.
METHODOLOGY
AUDIT to DESIGN to DEPLOY to VALIDATE to GOVERN to HANDOFF — engineering discipline
Data architecture is not a tag management project; it's a long-lived platform. A six-stage process makes every decision written, testable, and transferable.
01
AUDIT
Audit of current client-side GTM, GA4, pixels, CMP, consent implementation, data flow and cost visibility; signal loss, consent violations and data duplication are quantified.
02
DESIGN
Event taxonomy, identity strategy, consent policy, warehouse architecture and data contracts are designed; stakeholder approval (legal, IT, marketing, data) is secured.
03
DEPLOY
sGTM container, CAPI endpoints, Consent Mode v2 configuration, warehouse streaming and dbt models go live; blue/green deployment reduces risk.
04
VALIDATE
Shadow mode + dual tracking run old and new architecture in parallel; no cutover until event parity reaches 99%+; QA checklist covers 120+ items.
05
GOVERN
Schema registry, PII tagging, retention, RBAC, audit log and compliance reports go live; a data governance council convenes on a monthly cadence.
06
HANDOFF
Three weeks of hands-on training for your team + runbook + 6 months of async support; critical alert rotation and SLA contract handed over in writing.
— COMPARISON
In-house vs SaaS-dependent agency vs Roibase data engineering
The concrete difference three approaches make on data ownership, consent compliance, engineering depth and total cost.
| Dimension | In-house minimal | SaaS-dependent agency | Roibase engineering |
|---|---|---|---|
| Data ownership | Fragmented (every tool its own DB) | With the SaaS vendor | In your own warehouse |
| sGTM + CAPI | Partial (client-only) | None or vendor-managed | On your own infrastructure, full ownership |
| Consent Mode v2 + TCF 2.2 | Basic integration | Pre-set CMP, no customization | Written policy + legal review + tests |
| Identity resolution | None or email-only | Vendor black-box | Deterministic + probabilistic, open model |
| PII governance + audit log | Ad-hoc | Contractual, not operational | Runbook + monthly compliance report |
| Data contracts + schema registry | None | Bound to SaaS schema | Versioned, testable, owned |
| Reverse ETL + activation | Manual CSV | SaaS-locked | Warehouse-native, free choice |
| Total annual cost | 50-120k€ (fragmented SaaS) | 120-250k€ (agency + licenses) | 80-180k€ (setup + warehouse) |
PROOF
Outcomes, measured
Recovering unattributed conversions after iOS 14+/ATT through sGTM + CAPI.
Acceptable consent state distribution after TCF 2.2 + Consent Mode v2.
Typical number of separate data/analytics SaaS tools consolidated per customer.
In your own warehouse — only query + storage cost; no SaaS per-seat fee.
Typical mid-market timeline from audit to live shadow mode.
Average event delivery success after sGTM + CAPI dual-path.
WHAT WE DO
Engagement scope
Every offering is an outcome-based work package. Roibase blends strategy and execution inside a single team — no hand-offs.
Server-side GTM (sGTM)
Your own sGTM container on Google Cloud Run / AWS Fargate: data ownership is yours, no vendor lock-in, client load drops; PII redaction happens on the server.
Consent Mode v2 + TCF 2.2
IAB TCF 2.2 compliant CMP integration, dynamic propagation of ad_user_data + ad_personalization signals based on consent state; KVKK/GDPR 'legal basis' separation backed by written policy.
Conversion API (CAPI)
Server-side conversion events for Meta, Google, TikTok, Pinterest; hashed PII + event deduplication; 30-50% signal recovery and iOS 14+/ATT compliance.
BigQuery / Snowflake data lake
Raw event streaming + dbt models + semantic layer + Looker Studio/Metabase/Looker visualization; partition + clustering + cost optimization included.
Identity resolution
Deterministic (login, email hash) + probabilistic (device fingerprint, household) identity graph; a single user identity for cross-device journeys and cross-channel attribution.
CDP readiness
Segment / RudderStack / mParticle integration, or warehouse-native CDP (Census, Hightouch) reverse ETL pipelines; CDP selection made through independent evaluation.
Reverse ETL & activation
Automated push of computed segments (churn risk, LTV tier, product affinity) to Meta Custom Audience, Google Customer Match, Klaviyo, HubSpot, Braze.
Customer Match rebuild
Lookalike + retargeting rebuilt with hashed PII + CAPI; infrastructure that preserves ad platform performance in a pixel-less world.
Schema registry + PII governance
Event schema is versioned and testable; PII fields are tagged, retention + masking policy enforced; schema drift alerts for data quality monitoring.
Audit log + access monitoring
Every data access is logged — who, when, why; role-based access control (RBAC), data contracts, and automated monthly compliance reports.
— BENEFIT
The tangible, measurable return on data ownership
First-party data architecture isn't just compliance; it's direct leverage on ad performance, customer understanding and team velocity.
Ad signal recovery
30-50% signal recovery with Meta/Google/TikTok CAPI; ad platforms learn faster and optimize better.
Tool costs drop
Fragmented SaaS stack is consolidated into a single warehouse + dbt layer; annual license spend falls 40-60%.
Your team moves faster
A self-serve semantic layer lets business units answer their own questions; the data team shifts from bottleneck to enabler.
Consent compliance, written
TCF 2.2 + Consent Mode v2 + KVKK policy is audited and testable; the evidence file is ready for regulators.
Cross-channel journey visible
Identity resolution reveals user journeys across devices and channels; attribution models and cohort analysis run on unified data.
Data governance is sustainable
Schema registry, PII tagging, retention, RBAC, audit log — handed over to your team with a runbook and monthly compliance report.
DELIVERABLES
Concrete, written deliverables for every first-party project
Architecture, code, configuration, documentation and training — every artifact is versioned and handed over to your team.
Signal audit report
Quantitative assessment of existing signal loss, consent violations and tool duplication, 40-60 pages.
Event taxonomy & data contracts
Every event's name, properties, owner, schema version and backward compatibility rules.
sGTM container setup
Live sGTM on Google Cloud Run / AWS Fargate, blue/green deployment + CI/CD pipeline + rollback plan.
CAPI integrations
Server-side conversion events for Meta, Google, TikTok, Pinterest; event deduplication + hashed PII + error handling.
Consent Mode v2 + CMP policy
IAB TCF 2.2 compliant CMP configuration, dynamic ad_user_data/ad_personalization signals, written consent policy + legal review.
BigQuery/Snowflake warehouse
Raw event streaming pipeline, partition + clustering, cost optimization, monitoring + alerting.
dbt models + semantic layer
Staging to intermediate to marts layers, dbt tests, exposures, lineage graph + documentation site.
Identity resolution pipeline
Deterministic + probabilistic matching rules, household detection, cross-device journey table.
Reverse ETL pipelines
Segment syncs to Meta CA, Google CM, Klaviyo, HubSpot, Braze via Census/Hightouch; schedule + monitoring.
Schema registry & PII governance
Versioned schema records, PII tagging, retention + masking policy, schema drift alerts.
Audit log + compliance report
RBAC configuration, data access log, automated monthly compliance report (KVKK/GDPR + ad platform policy).
Runbook + 3-week training
Operational runbook, on-call rotation, SLA contract + 3 weeks of hands-on training for your team.
— SCOPE
What we do, what we don't — clear boundaries
First-party architecture is an engineering discipline; defining scope precisely prevents surprises and downstream billing.
We do
- Signal audit + consent health assessment
- Event taxonomy + data contracts design
- sGTM container setup + CI/CD + monitoring
- Meta/Google/TikTok/Pinterest CAPI integrations
- Consent Mode v2 + TCF 2.2 + CMP configuration
- BigQuery/Snowflake warehouse + streaming pipeline
- dbt models + semantic layer + tests
- Identity resolution (deterministic + probabilistic)
- Reverse ETL pipelines (Census/Hightouch)
- Schema registry + PII governance + audit log
- Legal/compliance review coordination
- Runbook + 3-week hands-on training
We don't
- Legal counsel (coordinated via partner lawyer + policy review)
- CDP license resale (we give vendor-agnostic recommendations, no commission)
- Maintaining fragmented SaaS stacks (consolidation is recommended)
- Raw analytics agency retainers (engineering sprints, not packages)
- Guaranteed 'pre-pixel' signal recovery (we give a realistic range)
- Warehouse licenses / cloud invoices (stay on the customer's account)
- Ad account management (separate scope with PPC/Growth teams)
- Plug-and-play SaaS deployment (every customer gets a custom architecture)
HOW WE WORK
First 8-week rollout to 6-month operation — who does what and when, in writing
Weeks 1-2: audit + discovery
Current GTM/GA4/CMP/pixel audit, consent health check, stakeholder interviews, architecture requirements document.
Weeks 3-4: design + data contracts
Event taxonomy, identity strategy, warehouse schema, consent policy, data contracts — approved by legal + IT + marketing.
Weeks 5-6: sGTM + CAPI deploy
Cloud Run/Fargate container goes live; Meta/Google/TikTok CAPI integration; shadow mode starts.
Weeks 7-8: warehouse + dbt
BigQuery/Snowflake streaming pipeline, dbt staging + intermediate + marts, first version of semantic layer.
Weeks 9-10: validate + cutover
Event parity testing, QA checklist, blue/green cutover; decommission plan for the old architecture.
Weeks 11-12: govern + handoff
Schema registry, PII tagging, audit log, RBAC; hands-on training begins, runbook delivered.
Months 4-5: activation + optimization
Reverse ETL pipelines, first segment activations, MMM/attribution data preparation, cost optimization.
Month 6+: steady state + audit
Monthly compliance report, quarterly data governance council, schema drift monitoring, SLA + on-call rotation.
— TOOLKIT
The tools we use — vendor-agnostic but decisive choices
We pick what fits each customer; we protect independence by taking no commissions.
SERVER-SIDE TRACKING
CMP & CONSENT
WAREHOUSE & CDP
REVERSE ETL & ACTIVATION
QUESTIONS
Frequently asked
— GLOSSARY
First-party data engineering terminology
Twelve critical terms that give your team and stakeholders a shared language.
- sGTM
- Server-side Google Tag Manager — a proxy that takes the browser GTM payload, sanitises and enriches it, then fans out to multiple destinations (GA4, Meta CAPI, TikTok, etc.). Extends cookie lifetime, resists ad-blockers and is the backbone of server-side conversion APIs.
- CAPI
- Meta's server-to-server event API running in parallel to the Pixel. Recovers the 20-40% of conversion signal lost in the browser due to ITP and ad-blockers; deduplication requires every event to carry an event_id and matching timestamp. A foundation of any modern paid-social stack.
- Consent Mode v2
- Google's TCF 2.2 compliant consent signal mechanism; ad_user_data + ad_personalization states.
- TCF 2.2
- The IAB Europe Transparency & Consent Framework version mandatory since 2024. Standardises the consent signal between publisher, vendor and user; CMPs (OneTrust, Cookiebot, Didomi) deliver mandatory compliance together with Google Consent Mode v2.
- Identity resolution
- Linking user activity across devices and channels to a single identity; deterministic + probabilistic.
- CDP
- Customer Data Platform; the system that unifies user profiles and exposes them to activation channels (Segment, mParticle, warehouse-native).
- Reverse ETL
- Pushing data from the warehouse to operational tools (Meta, Google, Klaviyo); Census, Hightouch are typical vendors.
- Customer Match
- Using a hashed first-party list (email, phone, mailing address) as a targeting/exclusion audience across Google Search, YouTube and Display. The base for lookalike seeds and win-back; the minimum match rate to be useful is typically 30%+.
- Data warehouse
- The cloud data store where raw and modelled event data live (BigQuery, Snowflake, Redshift, Databricks).
- Event schema
- Written, versioned definition of event names, properties, data types and owners; stored in the schema registry.
- PII
- Personally Identifiable Information; data that identifies a person (email, phone, IP, device ID). Managed under tagging + retention.
- Data governance
- The combined disciplines of data quality, access, stewardship and compliance; RBAC + audit log + data contracts are standard.
- GA4 Measurement Protocol
- A server-to-server protocol that sends events directly to GA4 over HTTP. Generates conversion signal from environments without a web pixel (CRM, IoT, app server); authenticates with api_secret + measurement_id and is wired to respect Consent Mode.
- Enhanced Conversions
- A measurement layer in Google Ads that ties a conversion to a user via hashed first-party data (email, phone). Recovers 3-15% of attribution lost to ITP and cookie decay; ships in web and lead-form variants.
- Offline Conversions
- The process of feeding back conversions that happen in CRM (lead-to-sale, call closure, store visit) to the ad platform via the click ID (gclid/wbraid/fbclid). The most reliable way to feed tROAS with real revenue.
- First-party Data
- Data the brand collects directly from its own properties (web, app, CRM, call centre, email, membership) under user consent. The most defensible fuel for performance marketing post-third-party-cookie; hashed and activated into ad platforms.
- Data Clean Room
- A secure compute environment where two parties (e.g. brand + media platform) can match and aggregate without exposing each other's raw PII. Google Ads Data Hub, Amazon AMC, Snowflake/Databricks clean rooms — used for overlap analysis, attribution and audience building.
- Identity Graph
- A relational graph that links one person across their devices, email, phone, payment identifier and hashed IDs. Foundation for cross-device attribution, retention modelling and LAL seed quality — the heart of any CDP.
- First-party Cookies
- Cookies set by the site's own domain and only sent on its own page requests. After third-party cookies were blocked, ITP further capped this category — server-side cookie setting + 1y+ rotation policy is now essential.
- Server-side Events
- Conversion events sent to the ad platform via API from your own server (sGTM, own backend) rather than from the browser. Immune to ad-blocker and browser caps; works with specs like CAPI (Meta), GA4 MP, TikTok Events API.
- Hashed PII
- A personally identifiable value (email, phone, name) frozen via a one-way cryptographic function (usually SHA-256). Mandatory for matching, custom-audience upload and Enhanced Conversions on ad platforms — a privacy and compliance requirement.
- Privacy Sandbox
- Google's suite of Chrome APIs designed to enable ad measurement, retargeting and fraud detection without third-party cookies: Topics, Protected Audience (FLEDGE), Attribution Reporting. The Google side of the cookieless future.
— DECISION TREE
Is a first-party data operation right for you?
Answer 4 questions Yes/No; get a clear recommendation.
01 / 04
Is your monthly ad budget above 30k USD?
The threshold for signal recovery to be economically meaningful.
— LET'S BEGIN
How much do you trust your pixels?
In a 2-hour signal audit we surface lost conversions, consent issues and warehouse opportunities.