---
title: "Derivative Data Products and Public Data Opening — Legal Challenges and Compliance Points"
author: "DCC Editorial"
published: 2026-05-16T03:00:00.000Z
url: https://datacompliancechina.com/posts/derivative-data-products-public-data-opening/
description: "As China opens public-sector datasets for commercial exploitation, companies building derivative data products (衍生数据产品) face a layered compliance problem: the definition of 'derivative data' in the National Data Administration's 2025 glossary is deliberately high-threshold (substantial transformation, significant value uplift); provincial rules on automated collection, source-labelling, and sensitive-data assessment are inconsistent; and a three-way collision between the open-data rules, third-party platform terms, and the 2025 Anti-Unfair Competition Law amendments has no clean resolution. Wang Yi and Yu Hao (both DEXCO-certified partners at Global Law Office Shenzhen) map the definitional landscape, five categories of operational red lines, and four protective strategies — including the new data-specific provision in the revised Anti-Unfair Competition Law — for practitioners building or advising on derivative-data businesses."
tags: ["derivative-data", "public-data", "data-property-rights", "data-product", "anti-unfair-competition", "authorized-operation", "data-economy", "data-registration"]
laws_cited: ["public-data-authorized-operation-specifications", "data-foundation-system-opinions"]
domains: ["data-economy"]
account: "shenzhen-data-exchange"
original_title: "DEXC+专栏 | 公共数据开放背景下衍生数据产品开发利用的法律挑战与合规要点"
original_author: "王艺，余灏 (Wang Yi, Yu Hao)"
original_publication: "深圳数据交易所 DEXC+ 专栏 WeChat Official Account"
original_url: "https://mp.weixin.qq.com/s/nNJVdLFSs65EsCR99TmXIw"
source_language: "zh"
---
> *Editor's Note — DCC.*
>
> This brief summarises 《公共数据开放背景下衍生数据产品开发利用的法律挑战与合规要点》
> by Wang Yi (王艺) and Yu Hao (余灏), both DEXCO-certified partners at the
> Shenzhen office of Global Law Office, writing for the Shenzhen Data Exchange
> DEXC+ column. The piece sits at the intersection of China's nascent public-data
> opening regime and the still-unsettled question of how "derivative data" is
> defined, owned, and protected. DCC runs it because the source is authoritative
> — DEXC+ is the practitioner commentary arm of one of China's principal
> state-backed data trading venues — and because the definitional and IP
> questions the piece addresses are live problems for any overseas counsel
> advising a client that processes Chinese public datasets.
>
> Readers should note that the article explicitly characterises itself as
> academic-practitioner opinion and not formal legal advice from Shenzhen Data
> Exchange. Several of the provincial regulations cited are implementing rules
> rather than national law; their application to a specific transaction will
> depend on jurisdiction and contract.

## What "derivative data" means — and why the definition is contested

The starting point for any compliance analysis is whether a data product
qualifies as **derivative data (衍生数据)** at all. Two official sources now
provide partial answers, but neither fully closes the interpretive gap.

The National Data Administration (国家数据局) published its second batch of
sector terminology definitions on 29 March 2025. Item 6 defines derivative
data as: data produced by a data processor that holds use-rights to the
underlying dataset, which, using professional knowledge through processing,
modelling, and key-information extraction, achieves a *substantial change*
(实质改变) in the content, form, or structure of the source data, thereby
*significantly increasing* (显著提升) data value.

The authors highlight two unresolved tensions in that definition. First, the
phrase "substantial change" requires a qualitative judgment that the official
guidance does not operationalise — practitioners and academics have not yet
agreed on where the threshold sits. Second, "significantly increasing" sets a
higher bar than a mere "valuable" standard; incremental cleansing or
de-identification alone is unlikely to meet it.

The national standard GB/T 43697-2024 (Data Security Technology — Data
Classification and Grading Rules) takes a broader, more enumerated approach:
derivative data is produced through statistical analysis, correlation,
mining, aggregation, or de-identification processing. It explicitly classifies
de-identified data, labelled data, statistical data, and fused data as
subtypes of derivative data.

The authors draw out two practical open questions that compliance counsel
should monitor: (1) whether the National Data Administration will publish
further interpretive criteria analogous to the Ministry of Finance's published
list of seven circumstances in which data assets should not be recognised on
balance sheets; and (2) whether the forthcoming unified data-property-rights
registration rules will address how derivative data from different source
channels is treated differently — a question already live in the
[public-data authorized-operation specifications](/laws/public-data-authorized-operation-specifications/).

## The legislative landscape for public-data opening

China's approach to public-data opening (公共数据开放) operates at two levels.
At the national level, the December 2022 joint opinion of the CPC Central
Committee and the State Council on building a data foundation system (the
"Data Twenty Articles," which DCC covers in the
[data foundation system opinions](/laws/data-foundation-system-opinions/))
established the principal framework: public data used for public-governance
and public-welfare purposes should be made conditionally available free of
charge; public data used for industrial and commercial development may be
subject to a conditional paid-use model.

At the provincial and local level, a patchwork of management measures has
followed. The authors survey rules from Shandong, Guangdong, Inner Mongolia,
Shanghai, Chongqing, Zhejiang, Yunnan, and Anhui, each of which adds its own
prohibitions and conditions on derivative use. The common thread is an
encouraging posture toward commercial development of opened public data,
paired with explicit prohibitions on a largely consistent set of conduct.

## Five operational compliance points for developers of derivative products

The authors identify five areas where operators using opened public data need
active compliance work.

**1. Automated collection is not prohibited but carries its own legal exposure.**
Provincial rules permit opened public data to be obtained by download, API
access, or algorithmic delivery of result data. None of the surveyed
provincial measures expressly prohibit automated collection (爬取). However,
operators remain subject to the DSL, the Network Data Security Management
Regulations (网络数据安全管理条例), and the Criminal Law provisions that
govern automated collection. In practice this means completing pre-collection
security self-assessments, controlling access frequency to avoid causing
service disruption to the data source, and never circumventing or breaking
technical protective measures or exceeding authorised access scope.

**2. Use must not damage rights or breach platform terms.**
A composite picture of prohibited conduct drawn from multiple provincial
measures includes: using public data to obtain illegal benefits; abusing the
rights obtained or harming national, public, or third-party interests;
violating the terms of any data-use agreement; and failing to implement
required security safeguards. The Chongqing measure adds a specific
prohibition that is worth noting for its intelligence-law resonance: operators
may not aggregate public data so as to produce information touching on state
secrets, national security, or other important sensitive content.

**3. Source-labelling is mandatory in some jurisdictions.**
Shandong and Yunnan provincial rules both require that any data product,
research report, or academic paper derived from opened public data must
identify the data source and the acquisition date. While this obligation
currently applies in fewer than all jurisdictions, the trend is toward wider
adoption, and the labelling requirement is easy to build into product design
early.

**4. Sensitive-data identification and security assessment after bulk collection.**
This obligation is currently uncommon but emerging. Chongqing expressly
prohibits aggregating collected public data into information touching on
national secrets or security. Anhui's draft public data management measures
go further: where aggregation or correlation analysis of public data could
produce classified or sensitive data, both the data-opening party and the
data-user must conduct a security assessment and implement corresponding
security measures. The authors note that academic commentary on the risks of
public-data aggregation is beginning to appear, signalling that regulators are
likely to treat this as a priority area.

**5. Rights-conflict analysis is unavoidable for complex products.**
In practice, derivative data products frequently encounter conflicts among
individual personal-information rights, third-party commercial-secret rights,
copyright interests, and public-interest considerations. The authors provide
a worked scenario: a social-media platform prohibits third-party automated
collection of its content, but some of that content consists of government
information (政府信息) published on the platform and subject to the Government
Information Disclosure Regulations (政府信息公开条例). A third party
commissioned by government to collect and analyse that data is operating in
a collision zone between the platform's terms, the government's disclosure
obligations, and the derivative-data rights of the commissioned party. The
authors' framework for resolving this: first, assess whether the platform
holds any legitimate legal interest in the data concerned; if it does, analyse
the priority of competing interests; if it does not, analyse why not. Where
public interest and commercial interest genuinely conflict, public interest
should in principle prevail.

## Property-rights registration — channel matters

One of the most practically significant points in the article concerns the
interaction between the *channel* through which public data was obtained and
the *scope* of property-rights registration available to a derivative-data
producer.

Under current data-property-rights registration practice, public data products
developed through a **public-data authorized-operation (公共数据授权运营)**
arrangement can be registered only for *use rights (使用权)* and *operational
rights (经营权)*; the holder cannot register *holding rights (持有权)*. By
contrast, derivative data produced from **unconditionally opened public data**
(i.e., freely available open data) can currently be registered for all three
rights (三权) — holding, use, and operational.

This asymmetry has direct implications for investment value, securitisation,
and dispute resolution. The authors flag that a unified national
data-property-rights registration framework has not yet been published, and
the question of whether derivative data can simultaneously hold both
data-property-rights registration and data-intellectual-property registration
— and whether that creates redundancy or genuine layered protection — remains
open. The [Datatang v. Yinmu data-IP registration case](/posts/datatang-v-yinmu-data-ip-registration-case/)
is a useful reference point for how courts and registration bodies are already
navigating the boundary between these two tracks.

## Four protection strategies for derivative-data rights holders

The article closes with four strategies for protecting derivative-data product
rights against infringement — important context for companies concerned less
about compliance risk and more about enforcing their own data assets.

**Strategy 1 — Accurately characterise your product and your obligations.**
A product with some public-data attributes does not necessarily carry an
obligation to make it freely available. Drawing on a recent Beijing internet
court ruling (described only as "the GX v. WX case"), the authors note that a
product with partial public-data character is not automatically a public-data
product. Investment in developing a derivative product should attract
Anti-Unfair Competition Law protection; the product owner cannot be required
to tolerate scraping by competitors.

**Strategy 2 — Use the new data-specific provision in the Anti-Unfair Competition Law.**
The Anti-Unfair Competition Law (2025 revision) added a "data-specific clause"
(数据专条) within its internet chapter. Article 13(3) prohibits operators from
obtaining or using data lawfully held by another operator through deception,
coercion, circumventing or breaking technical protective measures, or other
improper means, where doing so harms the other operator's legitimate interests
and disrupts market competition. The authors identify four elements that must
be established: (i) a competitive relationship between the parties; (ii)
acquisition or use of the other party's lawfully-held data through improper
means; (iii) the affected party holds a legitimate interest (including a
competitive interest) in the data; and (iv) damage to that interest and
disruption to market order.

**Strategy 3 — Pursue trade-secret protection.**
Citing recent case law and the Criminal Law Amendment (XIII) tightening
sanctions for trade-secret misappropriation, the authors suggest that a
derivative-data rights holder should consider classifying its product as a
trade secret — both for civil litigation purposes and as a deterrent to
employee-facilitated data leakage — provided the operator implements
appropriate technical and management controls to establish and maintain secrecy.

**Strategy 4 — Explore data-rights infringement and contractual liability.**
Under the Data Twenty Articles, the Civil Code, and the DSL, data is a
protected civil interest. Where a counterparty in a commercial arrangement
misappropriates data-product rights, or where a third party infringes the
data-product holder's rights, tort and contractual liability are both
available. The authors note that the Supreme People's Court has recently
issued guiding cases on data-rights protection, and the range of enforcement
strategies is becoming more diverse — including arbitration as an alternative
to litigation.

## Why overseas counsel should care

- **The definition of "derivative data" is the gating question for every data-product transaction.** Until the National Data Administration publishes clearer criteria, due diligence on a Chinese data-product acquisition must include a fact-specific analysis of whether the product genuinely satisfies "substantial transformation" and "significant value uplift" — and whether the source data was obtained through authorized-operation or unconditional-open channels, since that determines what property rights can be registered and traded.

- **Automated collection of opened public data is structurally risky even when not expressly prohibited.** Foreign operators running data ingestion pipelines against Chinese public datasets need pre-deployment security self-assessments, rate controls, and, critically, an aggregation analysis: several provincial rules and the emerging national trend treat bulk aggregation as a trigger for sensitive-data assessment obligations, which can apply even where the individual source records are innocuous.

- **The Anti-Unfair Competition Law (2025) data clause is a new offensive and defensive tool.** Article 13(3) is likely to generate litigation over the next two to three years as rights holders test it. For overseas companies whose Chinese partners or competitors are building derivative data products from public datasets, this provision — together with trade-secret doctrine — is the primary legal backstop if a product is misappropriated.

- **The holding-rights gap in authorized-operation products has deal-structure implications.** Where a client's Chinese data-product business is built on public-data authorized-operation contracts rather than freely-opened data, the inability to register holding rights constrains collateral value, affects how IP can be licensed, and could create complications in an M&A context. Structuring advice should account for this asymmetry now, before the unified registration rules are published and potentially lock in current practice.

## DCC sources

- Original: 王艺、余灏 (Wang Yi, Yu Hao), 《公共数据开放背景下衍生数据产品开发利用的法律挑战与合规要点》, 深圳数据交易所 DEXC+ 专栏 WeChat Official Account ([source](https://mp.weixin.qq.com/s/nNJVdLFSs65EsCR99TmXIw)).
- National Data Administration, 《数据领域常用名词解释（第二批）》(Second Batch of Common Terminology Definitions for the Data Sector), 29 March 2025.
- GB/T 43697-2024, Data Security Technology — Data Classification and Grading Rules (数据安全技术 数据分类分级规则), §3.10 and Annex I.
- CPC Central Committee and State Council, [Opinions on Building a Data Foundation System to Better Leverage the Role of Data as a Factor of Production](/laws/data-foundation-system-opinions/) (数据二十条), 2 December 2022.
- Anti-Unfair Competition Law (反不正当竞争法) (2025 revision), Art. 13(3).
- [Public-data authorized-operation specifications](/laws/public-data-authorized-operation-specifications/).
- Provincial public-data management measures cited: Shandong (2022), Guangdong (2021), Inner Mongolia (暂行办法), Shanghai (暂行办法), Chongqing (暂行办法), Zhejiang (条例), Yunnan (试行), Anhui (征求意见稿).

> This is an editorial summary, not a translation of the original DEXC+ column
> article. The authors' arguments and examples are attributed throughout;
> any simplification, emphasis, or operational extrapolation is DCC's. The
> original article represents the academic and professional views of Wang Yi
> and Yu Hao personally, and does not represent the position of Shenzhen Data
> Exchange. **Not legal advice.**
