---
title: "What Is Data, Really? — A Plain-Language Primer on Rules and Compliance"
author: "DCC Editorial"
published: 2025-08-28T01:00:00.000Z
url: https://datacompliancechina.com/posts/qinglan-what-is-data-rules-and-compliance-primer/
description: "What does it actually mean to call something 'data,' and what turns raw recordings into a data asset? Wang Qinglan uses a toy storage room metaphor to walk through the foundational concept overseas readers often skip: data is not just 'records' — it's records made under rules. Master data, metadata, ontology, the three-tier compliance taxonomy (legal / ethical / promised), and the three-step compliance workflow (select / allocate / execute) — all anchored in a concrete example a non-specialist can follow."
tags: ["data-fundamentals", "data-governance", "compliance-architecture", "commentary"]
laws_cited: ["dsl", "data-foundation-system-opinions"]
domains: ["data-economy", "data-security"]
account: "qinglan-data"
original_title: "数据的奇妙真相：从生活实例看它的真面目"
original_author: "王青兰 (Wang Qinglan)"
original_publication: "青兰数据观察"
original_url: "https://mp.weixin.qq.com/s/Dn4hlPZUHJOuUkLYzoaGLA"
source_language: "zh"
---
> *Editor's Note — DCC.*
>
> A surprising number of overseas data-compliance discussions skip the
> foundational question — *what is data*? — and jump straight into
> classification regimes, lawful bases, and cross-border paths. Wang
> Qinglan's primer fills the gap with a toy storage room metaphor that
> overseas readers will find unusually accessible. The piece is sequel
> to her [data governance / management / compliance disambiguation](/posts/qinglan-data-governance-management-compliance-disambiguation/),
> and reads cleanly as a stand-alone primer too. DCC's framing
> emphasizes where the conceptual building blocks anchor to the formal
> Chinese regime.

## Data isn't "records" — it's records made under rules

Wang opens with an exercise. Imagine you're cataloguing the toy cars in your home storage room and someone hands you this string:

> *"3+, mom, cherry red, 3-6, square, red, 2023, ages 3 to 6, plastic, ef555, 250, Shenzhen, 239,85,82, pre-school..."*

That's raw recording — observations captured in arbitrary form. If you tried to put this into Excel, you'd be unable to count anything. *"Red," "cherry red," "ef555," "239,85,82"* — all describing color, in incompatible formats. *"3+," "3-6," "pre-school"* — all describing age, in incompatible formats.

So Wang's first move: a working definition. *Data is the objective recording — under rules — of phenomena relevant to the business.* The rules are what separate **data garbage** from data that can be turned into a **data resource**, and ultimately a **data asset**.

The Chinese regulatory regime's three-tier vocabulary (per the NDA *Common Data Terms (First Batch)*) maps onto this:

- **Raw data** (原始数据) — first-collected recordings, unprocessed.
- **Data resources** (数据资源) — raw data, primarily processed, with potential for value creation.
- **Data assets** (数据资产) — data resources that are lawfully held or controlled, can be measured in monetary terms, and can produce economic or social benefit.

The progression *raw → resource → asset* requires rules at every step.

## What rules look like, concretely

To turn the cluttered toy-car notebook into something useful, Wang prescribes four kinds of rule. Each maps onto a formal compliance vocabulary overseas readers will recognize.

### Rule 1 — "Required dropdowns": master data and metadata

You don't let people type "big car" or "excavator-thing" in the type field. You constrain the field to a fixed enumeration: *engineering vehicle / car / racecar / motorcycle / other.* Same for color, age range, weight, etc.

This is **master data management** + **metadata management**. The fields are typed; the values are constrained; the recording is consistent across users. Wang's example is Taobao's typed inputs (quantity, color, size are dropdowns, not free text) — the architecture is identical.

### Rule 2 — Unified standards: ontology

"Battery capacity 6000mAh" / "2 hours charging gives 1 hour of play" / "excellent battery life" — three ways to describe the same thing. None of them comparable. None of them queryable.

The rule fix: define an ontology of measurable attributes. Battery life is measured in `mAh`. Playtime is measured in `hours`. Now the data is comparable and the records support analysis.

### Rule 3 — Automated capture: digital business process

Install a simple sensor in the storage cabinet. Take a toy out — clock starts. Put it back — clock stops. The "playtime" attribute is captured *automatically*, with no manual error.

In enterprise data-compliance vocabulary: **digitalize the business process**. Don't capture data from human attestation; capture it from instrumented systems. This is what the NDR's *risk assessment* and *security incident response* obligations assume — that the underlying business processes are digitalized and observable.

### Rule 4 — Hard requirements: the law

"This data must be stored within China." This is not a design choice — it's a hard requirement that overrides everything else. It must be in the rulebook.

For the storage room, this might be: "Receipts and bills tied to toys must be retained as records for tax purposes."

For an enterprise: "Important data must be stored in the PRC." "Sensitive personal information requires separate consent." "Cross-border transfer of PI above the threshold requires CAC security assessment." These are the **legal floor** rules — they bound everything the rulebook can authorize.

When all four rule types are combined, the storage room has a **Family Toy Car Data Pact** — a written record-keeping standard that turns raw observations into a usable data resource. Wang's metaphor: an enterprise's data governance framework is the same pact, scaled up.

## What compliance actually means

With the Pact in place, the question shifts: *am I following it?* This is compliance. Wang's three-tier taxonomy (introduced in her [previous primer](/posts/qinglan-data-governance-management-compliance-disambiguation/)) reappears:

- **Legal rules** (法规) — what the law mandates. "Important data must stay in country."
- **Ethical rules** (德规) — what the enterprise voluntarily commits to. "Don't sloppily fill in records to make our reports look good."
- **Promised rules** (诺规) — what the enterprise publicly promised. "Toy usage times accurate to the minute."

All three end up in the Pact. All three must be followed.

The compliance workflow Wang describes — *"three steps, in plain language"* — is the operational discipline:

### Step 1 — Select the rules

Decide which rules apply. Two inputs:

- *What is the storage room's situation?* — i.e., the enterprise's internal and external compliance environment.
- *Who interacts with the storage room and what do they want?* — i.e., stakeholder requirements.

But you cannot select *every* rule that might apply. Wang cites Professor **Chen Ruihua** (陈瑞华)'s **risk-oriented compliance model** — focus first on the highest-risk *mandatory* rules. PIPL Article 29's separate-consent requirement for sensitive PI is the storage-room equivalent of "don't leave sharp toys in reach of toddlers." Miss it once and the consequence is a regulatory or reputational injury.

Beyond the legal floor, there are *optional* rules — annual data security assessments, industry ethical standards, public commitments to customers. These aren't mandatory, but they earn trust from regulators, partners, and customers.

Critically, rule-selection is **not a once-and-done exercise**. New business lines, new jurisdictions, new regulations all trigger re-selection. The discipline is "accurate *and* dynamic."

### Step 2 — Allocate the responsibility

The selected rules become a **compliance obligation register**. Each obligation gets:

- An *owner* — whose job is it?
- A *process* — what concrete workflow embodies the obligation? ("PI processing requires 3-tier approval.")
- A *control* — how does the owner verify the process worked?

Wang's storage-room version: "Daddy collects engineering vehicles; Mommy collects regular cars; child collects blocks." The rule has names attached.

This is also the moment where external rules become **internalized institutional culture**. Without internalization, the rule lives only in the obligation register — a paper compliance program. With internalization, it becomes how the organization actually behaves.

### Step 3 — Execute

This is the simplest step in concept and the hardest in practice. *Do the things on the obligation register.* If you don't do them, you have a compliance failure — possibly a compliance risk event.

Wang's risk taxonomy:

- **Inherent risk** — the risk before any controls. Storage room with no lock and no rules: theft is just a matter of time.
- **Residual risk** — the risk after controls are in place. Lock installed, rules written, but someone occasionally forgets the lock. Risk reduced but not zero.

Wang's blunt observation: *"It's impossible to be 100% compliant — humans are uncertain, business is dynamic, there's always something to adjust."* What matters is the framework — risk-allocated obligations, written process, executable controls.

## Two organizational shapes for the compliance system

Wang's practical advice on building the compliance system:

- **By position (job role).** "Customer-facing staff protect user info; operations record data sources." Each role has a defined set of obligations.
- **By business process.** "From data collection → storage → use, each step has its own controls." Each step has a defined set of obligations.

Both work. Pick whichever organizational shape fits the enterprise. Either way, the *clear logic* matters more than the *absolute zero-error target*.

## Why this matters for overseas compliance teams

Three operational takeaways from Wang's primer:

- **Don't skip the "what is data" question.** Many overseas counsel jump from PIPL provisions straight to lawful-basis analysis, missing that the enterprise has not yet *operationalized* what counts as data, what attributes it carries, and where the records are. The PIPL framework only works once the underlying data is well-formed. *Build the master data + metadata layer first.*
- **The three-tier compliance taxonomy is not just academic.** A compliance team that conflates *legal floor* with *ethical commitment* either over-burdens itself (treating optional commitments with mandatory rigor) or under-protects (treating mandatory rules with optional flexibility). Wang's three-tier model is the practical sorting mechanism.
- **Inherent vs residual risk are the diagnostic axes.** When something goes wrong, the first question is which one: was the inherent risk un-controlled (no rule for this scenario), or was a control bypassed (rule existed but not followed)? Different diagnoses, different fixes.

The deeper point in Wang's piece is that **data compliance starts before the law**. The law constrains what an enterprise can do with data; but the enterprise's *data-handling discipline* — what counts as data, what rules govern it, who owns each rule — determines whether compliance is achievable at all. Without the discipline, no amount of legal review will produce a compliant operation.

---

— Wang Qinglan (王青兰), *数据的奇妙真相：从生活实例看它的真面目* (The Magical Truth About Data — Seeing Its Real Face Through Everyday Examples), 青兰数据观察 WeChat Official Account, August 28, 2025. [Original article (Chinese).](https://mp.weixin.qq.com/s/Dn4hlPZUHJOuUkLYzoaGLA)

*Not legal advice. The above is DCC's structured summary of Wang's commentary; not a verbatim translation. The author's views are her own and do not represent her employer.*
