Why Upstream Won't Operate Its Data — Control Degradation, Derivative Data, and Irreducible Uncertainty

Editor’s Note — DCC.

This is DCC’s summary and analysis — not a translation — of 《上游为何不愿对外经营数据？控制降级、衍生数据与不确定性下的经营决策》, the third study note by Hong Yanqing (洪延青) on his 网安寻路人 channel in his series on China’s “separation of three rights” (三权分置) data-property framework. It follows Two Paths for the “Right to Hold Data” (part one) and When the “Right to Use Data” Goes External (part two). Where part two asked what externalising a use right transfers, this note asks the prior, more practical question: will the upstream provide the data at all? The original is linked at the foot; the framing for overseas counsel is ours.

From “what operation transfers” to “whether it happens”

In the “Data Twenty Articles” (数据二十条) structure, the Right to Operate Data (数据经营权) is the right to provide data externally — by transfer, licence, capital contribution, or pledge — the analogue of disposing of tangible property, meant to push data property out into the market. Part two showed that what operation usefully hands over is licensed use, and that once the downstream produces derivative data (衍生数据) a new object forms in its hands and the upstream’s control changes.

Hong’s question here is one step earlier: under real conditions, will an upstream exercise its operation right and provide data outward at all? His judgment: it will, but quite narrowly. The upstreams that rely on data to sustain a continuing relationship — the “control-dependent” type from part two (platforms, holders of core user data, owners of high-value industrial or irreplaceable training data) — tend not to provide open, raw, autonomous access. They turn to controlled use, or decline. Not because they undervalue the data, but because external operation forces them to face a cluster of irreducible, mostly structural uncertainties.

From licensed use to control degradation

The upstream’s control over raw data has erga omnes (对世) effect: the data sits with the upstream, a downstream must be authorised to use it lawfully, and that control binds the world without needing a contract with any particular person. The derivative data the downstream then produces, however, is a new object on which the downstream — not the upstream — stands as creator. On the prevailing view, derivative data’s chain of succession from the source is severed, and the downstream independently holds, uses, and operates it. So the upstream’s erga omnes control over the raw data does not automatically extend to derivative data; against derivative data, the upstream has at most a contractual claim that binds one counterparty.

This degradation holds even if the contract is perfectly drafted and fully enforced, because it concerns the nature of the upstream’s claim, not its enforceability: erga omnes control reaches the raw data, not the derivative; over the derivative, the upstream is at most a claimant against a specific party, not a right-holder against the world. The upstream’s position drops from an automatic, world-binding right to a per-item, counterparty-only contractual claim.

And the degradation is uneven. For abstract derivatives — models, scores, indices — it is most complete: the raw data no longer exists inside them, the value has been extracted, and there is neither erga omnes control nor a way to recover the extracted value. For derivatives that still contain the raw data, or are fused from several parties’ data, the upstream may keep a co-holding position good against others (the official line on fusion is that each party may co-hold and that external circulation in principle needs the other participants’ consent). Control degrades from the abstract end toward the fusion end.

Two scholarly fixes — both tilt downstream

The hard case is not where the parties agreed, but where the contract is silent, unclear, or breached: who owns the derivative data, and can the upstream get it back? Two representative approaches both target this gap, and Hong notes they converge where it matters.

A law-and-economics approach treats it as a conflict-allocation problem, using the Calabresi–Melamed property-rule / liability-rule framework to switch between rules as transaction costs and courts’ valuation error change. Its baseline favours the processor: where transaction costs are low, a good-faith processor takes the derivative-data interest outright without compensation; it resists co-ownership (to avoid an anticommons), counts the processor’s own input — and even the value of other parties’ fused-in data — as processing value-add, and pushes the upstream’s protection onto the liability side (an IP-style compulsory licence fee in place of unjust enrichment).
A doctrinal approach reasons by analogy to the Civil Code’s accession (添附) rules, treating derivative data as a new object independent of the raw data, with identification stacked on substantial change + marked value increase + irreversibility. Ownership follows agreement; absent agreement, it vests in the processor by contribution and “putting data to fullest use” (数尽其用) — and the processor takes the derivative right even if it does not hold a use right in the raw data, so that even illegally-scraped source data affects only liability, not the attribution of the new product. The upstream’s protection splits into personality interests (always retained by the individual) and property remedies via unjust enrichment or tort.

Methodologically far apart, the two agree on two points: each reduces the upstream’s protection over derivative data from erga omnes control to a counterparty-only claim (unjust enrichment, tort, or a compulsory fee); and each defaults the residual interest, absent agreement, to the downstream processor, justified by data’s non-rivalry, the survival of the source data, and the incentive to innovate. That is exactly the control degradation above, accepted as a premise.

Ex-post allocation vs. ex-ante participation

Both schemes solve the same thing — the data has been provided, a dispute has arisen, who gets the derivative and how much compensation — and they do it finely. But both presuppose the data was shared. The prior question is: under such a regime, will the upstream provide the data in the first place?

Solving ex-post allocation does not solve ex-ante participation — and a downstream-favouring default can make participation worse. The more the default tilts to the processor, the larger the upstream’s expected loss from operating; the larger that loss, the more it declines to share; the less it shares, the less source data the processor has. The rule incentivises downstream utilisation while suppressing upstream supply. (The law-and-economics camp half-sees this — it warns that over-discounting source-data interests causes under-investment in data — but the more decisive margin is the upstream simply refusing to share data it already holds.)

The uncertainties an upstream faces ex ante

Hong sorts them into two classes. Class one — attribution rules can engage, but cannot eliminate them before the fact:

Qualification. Will the processed output count as derivative data (independent, vesting in the processor, leaving the upstream a compensation claim at most) or as still the original data (upstream interest intact)? It is binary and decisive — yet the test for derivative data is unsettled (one view requires substantial change + marked value increase + irreversibility; another makes marked value increase the core and demotes irreversibility to evidence), so ex ante no one can predict which side an output lands on.
Default ownership. Contracts are never complete; the gaps fall to default rules that are doctrinally divided and, in their firmer parts, tilt to the processor. Predictability does not cure unfavourable content.
Subjective state. Whether the processor acquired the source data in good or bad faith may or may not affect its ownership of the derivative, depending on the approach — and the broader the licence scope, the harder it is to find the processor exceeded authorisation, so the easier it is good-faith and takes the full derivative right.
Remedy measurement. Even winning yields a claim of uncertain amount — floating between a licence fee, profit share, and full disgorgement, benchmarked against IP licensing ratios. The upstream trades a definite, world-good position for an indefinite, counterparty-only claim.

Class two — no attribution rule can reach these, and they are the main deterrent:

Foreseeability and drafting. Data value is combinatorially emergent — the most valuable use is often the downstream recombining the source with other data and models, unforeseeable at signing. You cannot pre-limit what you cannot foresee; and derivatives stack (second- and third-order derivatives sit beyond a clause that bound the first). The contract is necessarily incomplete, and its gap falls exactly where value and risk are highest.
Discovery and tracing. Whether the downstream trained a model, exceeded scope, or re-licensed is often unknowable to the upstream — derivative data is intangible, internal to the downstream, fusible, and can pass de-identification off as anonymisation. Hard to detect; hard to prove or trace after fusion and abstraction.
Privity, chain, and payment. A contract binds only the counterparty. If it transfers on to a third party in breach — or a third party simply scrapes the downstream’s data product — the upstream has no hold (and, on the doctrinal view, the scraper acquires a full derivative right). The counterparty may also go bankrupt or be acquired, leaving the upstream’s claim worthless.
Fusion and co-ownership. Once the source is fused with others’ data, whether the upstream keeps a position good against the world is unsettled — one view rejects co-ownership and vests in the processor; another excludes such products from “derivative data” via the irreversibility test.
Abstraction leakage. Even a contractual duty to delete the derivative dataset cannot recover the parameters a model has already learned or the skills the downstream’s people have absorbed — that value has changed form, beyond any attribution rule or damages.
Compliance and personal information. If personal information is involved, “provision” extends compliance duties and joint exposure back to the upstream; and the upstream often cannot tell whether the downstream’s derivative is truly anonymised — most “anonymisation” is de-identification, still personal information — so its exposure does not necessarily end on delivery.
Counterparty strategy. Once data is delivered, incentives shift: post-possession delay and renegotiation (hold-up); information asymmetry hides intent and capability ex ante; worst of all, the counterparty may use the capability built on the source data to compete with the upstream.

Across both classes: the deterrent uncertainties cluster in class two, which no attribution-or-compensation scheme can touch; and where class-one rules could engage, their tests are contested and tilt against the upstream. So no allocation scheme can eliminate, ex ante, the uncertainty that actually drives whether the upstream provides the data.

The operation right contracts — within limits

Hence the operation right tends, in practice, to contract: data-dependent upstreams avoid open, raw, autonomous provision. Hong adds three boundaries:

Contraction is not cessation. Upstreams respond without needing omniscience — data sandboxes, privacy computing, “data does not leave the domain,” federated modelling, strict purpose limits, and grant-back audits all bound the uncertainty with technology and contract, substituting controlled use for raw delivery. They do not stop providing use; they stop providing use detached from control.
Monetisation upstreams are excluded. A one-off seller — data broker, dataset sale — bears no consequence from the buyer’s loss of control over derivatives; it has already realised the value in the price. The thesis targets only upstreams that mean to keep a continuing relationship and control.
Not sharing has a cost too. Data depreciates; competitors may move first. So this is a marginal, directional claim — uncertainty raises the upstream’s reservation price and shrinks the deals it will do, pushing it toward controlled forms, not a blanket refusal.

This is also why “but isn’t the default rule there to reduce uncertainty?” doesn’t rebut the point. A default rule at most trims some ex-post allocation uncertainty; its content is contested (so still unpredictable ex ante), and its predictable part tilts downstream (so foreseeable loss of control does not raise the upstream’s willingness to provide). The real deterrents — foresight, detection, privity, fusion, abstraction leakage, compliance, strategy — sit outside the rules’ range. Default rules govern how to allocate after the fact, not whether to act beforehand.

Establishing a right is not guaranteeing its exercise

Hong’s close ties the series together. The operation right is conceptually clear — the right to provide data externally and move data property into the market. But a clearly-defined right and an exercised one are two different things. Licensing use forms a new object the upstream cannot reach, dropping its control from erga omnes to a personal claim; placed in real conditions, the upstream then faces a layer of irreducible, mostly structural uncertainty that the two scholarly fixes — however fine on ex-post allocation — neither reach nor relieve, and that their downstream-tilting defaults can worsen. So the operation right contracts: relationship-keeping upstreams move to controlled operation, or decline.

It is one thread with the first two notes. Holding is thin, its boundary supplied by behavioural norms; the use right is real but not self-sufficient, its boundary supplied by contract and technology; the operation right’s exercise depends on the surrounding allocation of risk — which the three-rights modules cannot themselves create or arrange. A framework can establish the type of a right; it cannot supply the conditions to exercise it — and here those conditions, in contract, technology, and public law, are not yet adequately supplied.

Why overseas counsel should care

This is the rigorous answer to “why is Chinese data supply so thin?” When a data exchange listing, a sourcing pitch, or an AI-training-data deal stalls on the supplier side, the cause is usually not price but structural control loss — the upstream cannot recover value once it leaves, and no contract fully fixes that.
Controlled access is the equilibrium, not a quirk. Sandboxes, privacy computing, “data does not leave the domain,” and federated modelling are the rational upstream response — design your China data projects to consume outputs and model results, not raw datasets (the same pattern across part one and part two).
If you are the upstream/licensor, price and bound the loss you cannot reverse. Use grant-back, no-train/no-fusion, sub-licensing bans, output review, and audit — but assume detection and tracing will be costly, and put a price on the control you will lose rather than relying on recovery.
If you are the downstream/processor, your derivative work is comparatively well-positioned — but document it. China’s defaults tend to vest models, scores, and labels in the builder; still, record your value-add and lawful sourcing, because good-faith and scope-of-authorisation will decide marginal cases.
Don’t read a clean three-rights label as a clean deal. Defining holding, use, and operation does not, by itself, make data tradeable; the risk-allocation plumbing around the modules — contract, technology, PIPL/DSL compliance — is what determines whether a transaction actually happens.

DCC sources

Original: Hong Yanqing (洪延青), 《上游为何不愿对外经营数据？控制降级、衍生数据与不确定性下的经营决策》, on the 网安寻路人 channel — mp.weixin.qq.com.
Series on DCC: part one — Two Paths for the “Right to Hold Data”; part two — When the “Right to Use Data” Goes External; part four — Data “Parallel Property Rights”.
Cross-references on DCC: the Data Twenty Articles (source of the three-rights structure) · the Common Data Terms, Batch 2 (official definitions of the operation right and derivative data) · PIPL · the Data Security Law · the Network Data Security Regulation · the draft Data Property Rights Registration Guidelines.
Part of the data-economy domain on DCC.

This is an editorial summary and analysis of Hong Yanqing’s commentary, written in DCC’s own words for overseas readers — not a translation of his article, and not a reproduction of it. Quoted phrases are short and attributed; the full argument is his, at the link above. Not legal advice.