---
title: "Tang Linyao — Data-Broker Derivative Harms and the 'Data Integration Analysis Framework'"
author: "DCC Editorial"
published: 2026-05-28T06:00:00.000Z
url: https://datacompliancechina.com/posts/tang-linyao-data-broker-derivative-harms/
description: "Tang Linyao (Chinese Academy of Social Sciences) maps the regulatory gap for data-broker derivative harms — the harms that arise not from direct PI leakage but from the integration and aggregation activity that data brokers themselves perform. The analytical core: a vertical / horizontal data-relations framework that explains why existing PIPL-style protection (vertical-relationship-focused) systematically fails to address horizontal-relationship harms; and the 'abstract risk substantialization' doctrine borrowed from US precedent and EU GDPR to bring data-broker risk into ex-ante regulatory scope. Operationally, Tang proposes a 'Data Integration Analysis Framework' with concrete tiering (三高 / 双高 / 单高 / 三低) that translates academic doctrine into compliance-program-grade controls. Applied to a real Shenzhen Data Exchange listing as worked example."
tags: ["data-economy", "data-broker", "data-exchange", "derivative-harm", "privacy", "commentary"]
laws_cited: ["data-foundation-system-opinions", "pipl", "dsl", "personal-info-audit-measures", "network-data-security-regulations"]
domains: ["data-economy", "data-security", "personal-information"]
account: "dejyfz"
original_title: "学术｜唐林垚：数据经纪的衍生风险与法律应对"
original_author: "唐林垚 (Tang Linyao), Chinese Academy of Social Sciences Law Institute"
original_publication: "《法学家》(The Jurist), Issue 2, 2026; reposted via 数字经济与法治 WeChat Official Account"
original_url: "https://mp.weixin.qq.com/s/L4A6N26tXnN05iSxqMNe3w"
source_language: "zh"
---
> *Editor's Note — DCC.*
>
> Tang Linyao's piece in 《法学家》(*The Jurist*, the flagship Chinese
> law journal of Renmin University) takes on a structural problem
> Chinese — and global — data-broker regulation has not yet solved:
> the harms that arise from the *integration activity itself*, not
> from the integrated data being misused. The analytical move — a
> vertical-vs-horizontal data-relations framework that explains why
> PI-protection rules systematically miss this — is theoretically
> ambitious. But the operational payoff is what makes the piece useful
> for compliance teams: a four-tier "Data Integration Analysis
> Framework" (三高 / 双高 / 单高 / 三低) that translates the doctrine
> into concrete compliance gating, applied as a worked example to a
> real Shenzhen Data Exchange listing. DCC's brief focuses on the
> framework and its operational implications for overseas counsel
> working with Chinese data exchanges, data brokers, and data-broker-
> -intermediated supply chains.

## What "data broker" means here

Tang uses "data brokery" (数据经纪) in a deliberately broad sense — referencing the FTC definition (collecting from multiple sources, aggregating, analyzing, on-selling), the California CCPA definition (no direct business relationship with the individual, sells to third parties), and the EU Data Governance Act's "data intermediation services" concept. Mapped to Chinese practice: includes Shanghai Data Exchange's "data service providers" and the broader category of intermediaries facilitating data collection, aggregation, and trading.

Why this matters: the *Data 20 Articles* explicitly call for cultivating data brokery as a class of third-party professional service. As of 2025, the major Chinese data exchanges added more than 2,600 new supply / demand participants. Data brokery is now structural infrastructure for the Chinese data-element market — not a marginal activity.

## The structural problem — vertical and horizontal data relations

Tang's analytical pivot: data relations come in two distinct types.

**Vertical relations (垂直数据关系)** — the direct interaction between the *data subject* and the *data processor*. Classic example: a depositor authorizes a bank to access spending data in exchange for instant credit scoring. PIPL is built around vertical relations: the data subject controls (via consent, access right, deletion right) what the processor does with the subject's data.

**Horizontal relations (水平数据关系)** — the *indirect* relationship among data subjects formed when shared group features become the basis for processor decisions. Classic example: a depositor is labeled "low-income" by the bank's loan-pricing algorithm because the depositor shares the "browse-by-price-low-to-high" feature with other depositors classified as low-income. The depositor never interacted with the people who created that group classification — but is now subject to its consequences.

Traditional data-processing activity maintained a tight coupling between vertical and horizontal relations: the processor's services in the vertical relationship were strictly limited by what its horizontal-relationship insights could justify. The depositor agreed to the bank seeing transaction data *for credit scoring*; the bank's horizontal grouping was constrained by that purpose.

**Data brokery decouples vertical and horizontal relations.** Once a processor can buy data from a broker, it no longer has to maintain a vertical relationship with the source — and thus is no longer constrained by the "minimum necessity" principle that vertical relationships impose. The processor can construct *entirely new* horizontal relationships using purchased data, with no vertical-relationship subject ever having consented to the resulting categorization.

This is the structural break PIPL doesn't address. PIPL is a vertical-relationship instrument; it cannot regulate horizontal-relationship construction that bypasses the vertical channel.

## Derivative harms — what gets missed

Tang identifies two types of harm the existing regulatory framework fails to address.

### 1. Privacy erosion (隐私侵蚀)

The construction of horizontal relationships using broker-acquired data exposes individuals to inferences about themselves they never authorized. Tang's example: data aggregated through normal market trading and re-processed reveals individual-level behavioral insights that the original disclosure context did not anticipate. The individual loses control over their *external social construction* — without any specific PIPL provision being violated.

Importantly, Tang frames this as a *group privacy* (群体隐私) harm. The damage is collective: the categorization affects every individual in the group, but no single individual can bring a successful PIPL claim because no specific direct harm to them is provable.

### 2. Downstream harm (下游损害)

Tang draws on US scholarship (the "downstream harm" and "data information harm" concepts) to describe individual injuries — privacy, dignity, social discrimination, lost opportunities, manipulation — that occur *because of* third-party action enabled by broker-supplied data. Tang's flagged case: Remsburg v. Docusearch, where a perpetrator purchased data from a broker, used it to track a victim, and killed her. The US court imposed negligence liability on the broker.

The structural problem: in Chinese tort doctrine, the broker's contribution to downstream harm is usually absorbed by the principal tortfeasor's act and not separately evaluated. The 酷车易美 case (a Chinese precedent on automotive-data integration risk) illustrates the resulting under-protection: the court rejected the plaintiff's claim on grounds that the harm was prospective rather than realized.

## The "abstract risk substantialization" doctrine

Tang's regulatory move is to import a concept already developing in EU and US doctrine: **abstract risk substantialization** (抽象风险损害化). The claim: where data-broker activity creates a *substantial* probability of derivative harm, the risk itself should be treated as cognizable harm for ex-ante regulatory purposes — even before any concrete injury materializes.

Two judicial standards Tang draws on:

- **"Certainly impending"** — risk must be imminent and real, not speculative.
- **"Particularly targeting"** — risk must single out the specific plaintiff, not vaguely affect everyone.

Combined: a data-broker risk is regulable when (a) it's actually likely to materialize and (b) it specifically threatens identifiable parties or groups.

The framework lets regulators move from "wait until someone gets hurt" to "prevent the risk from materializing in the first place" — but only when the risk meets the substantiality threshold.

## The Data Integration Analysis Framework

This is where Tang's piece becomes operationally useful for compliance teams. The framework provides two parallel analytical structures — one for the data broker itself (used in **Data Protection Impact Assessment**, DPIA) and one for the regulator (used in **Fair Data Brokery Practice**, FDBP).

### Framework A — For the data broker (DPIA)

Four factors per dataset:

**(1) Anonymization level.** Apply UK ICO's "motivated intruder test" — assume all reasonable adversary techniques and check whether the data, combined with other data, becomes re-identifiable. Use generalization (e.g., age 45 → 40-50 bucket) and randomization (noise injection, differential privacy) techniques in combination.

**(2) Sensitivity.** What harm could result if the data is integrated with which other data? Particular attention to data that, when combined with PI subjects' personal attributes, would cause unfair algorithmic outcomes.

**(3) Dataset volume — four sub-factors:**
- Number of data subjects (larger → broader potential harm footprint)
- Number of attribute categories (more → more identifiers for de-anonymization, more nodes for harmful relation construction)
- Time span (longer → more precise insights, stronger surveillance/influence potential)
- Cross-group migration potential (datasets that reveal common features of large external groups, even with limited direct-subject scope)

**(4) Inferential data ratio.** Inferential data (probability-derived inferences vs. raw original data) is more likely to encode subjective bias and to produce "certainly impending" harm.

### Framework B — For the regulator (FDBP)

Focused on combinations of dataset-with-other-datasets rather than dataset-in-itself. Four factors:

**(1) Subject overlap (主体重合度).** When the same data subjects appear in multiple datasets, integration risk for re-identification and harmful targeting rises sharply.

**(2) Attribute overlap (属性重合度).** When datasets cover the same attribute categories, cross-comparison identifiers multiply.

**(3) Original-processing-purpose overlap (原初处理目的重合度).** Both high overlap (concentration of purpose-trauma) and low overlap (broadening of effective collection scope) increase risk; the regulator should examine both directions.

**(4) Time overlap (时间重合度).** Similar dual analysis — high time-overlap heightens re-identification risk; low time-overlap may produce inaccurate but consequential horizontal-relationship inferences.

### Risk tiering (三高 / 双高 / 单高 / 三低)

The framework's output is a per-dataset risk classification using the "potential victim count × harm probability × harm degree" formula. The four tiers:

- **三高 (triple-high)** — high on all three. Data should not be brokered. Regulator should suspend the transaction.
- **双高 (double-high)** — high on two. Data should be remediated to comply, conducted only in a privacy-computing framework, or require regulator pre-approval.
- **单高 (single-high)** — high on one. Stricter purpose limitation + enhanced risk disclosure; ongoing regulator attention.
- **三低 (triple-low)** — no special action needed beyond baseline.

## The worked example: Shenzhen Data Exchange listing

Tang applies the framework to a real listing: Shenzhen Maternal & Child Health Hospital's anonymized 2018-2023 dataset of confirmed pregnancy-induced hypertension patients, listed on Shenzhen Data Exchange on May 19, 2025.

**Framework A (DPIA) analysis:**

- *Anonymization* — high (generalization + perturbation applied)
- *Sensitivity* — *originally* sensitive (medical data), but de-sensitized by time gap (subjects no longer pregnant; condition typically resolves post-partum) — *unless* used in insurance underwriting context, where group-feature inference could lead to discriminatory pricing. The hospital's restriction prohibiting use for AI algorithm development on pregnancy-hypertension addresses the high-sensitivity vector.
- *Volume* — large (5-year span, major tertiary hospital, large potential cross-group inference base).
- *Inferential data ratio* — zero.

Verdict: many potential subjects, but low harm probability and low harm degree → "二低一高" (two-low-one-high) low-risk profile. Hospital's use restriction is sufficient; no additional gating needed.

**Framework B (FDBP) analysis at the exchange:**

- *Subject overlap* — geographic concentration (Shenzhen-area); exchange should scrutinize same-geography dataset merging.
- *Attribute overlap* — moderate; flag if buyer has already acquired commercial-insurance datasets that could trigger inference.
- *Processing-purpose overlap* — purpose limited to research / teaching; flag commercial-use attempts.
- *Time overlap* — flag both same-period merging risk and non-period inherent-attribute merging risk.

This level of operational granularity is what makes the piece useful for compliance program build-out.

## Liability allocation: applying Civil Code Article 1170

Where derivative harm actually materializes, Tang argues for applying **Civil Code Article 1170** (共同危险行为, joint dangerous conduct) to data-brokery cases. Under Tang's framework:

- **For 三高 brokery action** — broker bears primary liability regardless of downstream actor's posture; downstream actor's joint liability depends on subjective intent.
- **For 双高 or 单高 brokery** — broker bears joint and several liability with downstream actor.
- **For 三低 brokery** — broker does not bear downstream-harm liability.

This is doctrinally significant: it pulls data-brokery into the multi-actor joint-liability framework rather than treating it as a separate single-tort question. Chinese courts are likely to find the framing useful as a structural anchor.

## What this tells overseas compliance teams

- **The vertical / horizontal distinction is the analytical key.** Multinationals using or supplying data through Chinese data brokers should map their data flows in terms of which horizontal relationships their broker activity is constructing, not just which vertical relationships their PI processing creates. The vertical-only analytical posture is now structurally inadequate.

- **The 三高 / 双高 / 单高 / 三低 tiering is portable as a compliance-program control.** Adapt it as the data-broker-input and data-broker-output screening framework in your Chinese operations. Build the four-factor analysis (Framework A) into your DPIA template; build the four-factor analysis (Framework B) into your vendor-acquisition and customer-disclosure-control templates.

- **The Shenzhen Data Exchange example is the operating template.** Where a Chinese counterparty (especially a state or quasi-state institution) lists data through an exchange, expect the kind of multi-factor pre-listing screening Tang describes. Provide the source-data documentation that supports the analysis — particularly anonymization technique documentation and use-restriction language.

- **The Civil Code Article 1170 framing is a forward signal on liability allocation.** The Chinese tort doctrine on data-brokery liability is being articulated *now* in the law-journal layer; expect courts to begin adopting the framework over the next 12-24 months. Multinationals should pre-position vendor agreements and indemnity allocations against the contemplated joint-liability framework.

- **Data brokery is structural; the regulation is catching up.** Treat the regulatory gap Tang identifies as a forward indicator: the gap will close, probably through some combination of (a) sectoral rulemaking applying Tang-style frameworks (b) data-exchange self-regulatory rule articulation, (c) judicial precedent applying joint-dangerous-conduct doctrine. Compliance programs designed against the *prior* (PIPL-only) framing will be inadequate when the new framing crystallizes.

The deeper structural point: data brokery is the *infrastructure layer* of the Chinese data-element market, and the law has not yet caught up to its risk profile. Tang's piece is the doctrinal preparation for the next round of regulation. Overseas counsel watching this space should treat the 法学家 publication as the *upstream* of rulemaking, not a reaction to it.

---

— *唐林垚, 数据经纪的衍生风险与法律应对 (Data Brokery's Derivative Risks and Legal Response), 《法学家》(*The Jurist*), Issue 2, 2026; reposted via 数字经济与法治 WeChat Official Account, May 27, 2026. [Original article (Chinese).](https://mp.weixin.qq.com/s/L4A6N26tXnN05iSxqMNe3w)*

*Not legal advice. The above is DCC's structured summary of Tang's analysis, with framing for overseas counsel; the vertical / horizontal data-relations framework, the Data Integration Analysis Framework, and the Shenzhen Data Exchange worked example are Tang's.*
