Derivative Data Products and Public Data Opening — Legal Challenges and Compliance Points

Editor’s Note — DCC.

This brief summarises 《公共数据开放背景下衍生数据产品开发利用的法律挑战与合规要点》 by Wang Yi (王艺) and Yu Hao (余灏), both DEXCO-certified partners at the Shenzhen office of Global Law Office, writing for the Shenzhen Data Exchange DEXC+ column. The piece sits at the intersection of China’s nascent public-data opening regime and the still-unsettled question of how “derivative data” is defined, owned, and protected. DCC runs it because the source is authoritative — DEXC+ is the practitioner commentary arm of one of China’s principal state-backed data trading venues — and because the definitional and IP questions the piece addresses are live problems for any overseas counsel advising a client that processes Chinese public datasets.

Readers should note that the article explicitly characterises itself as academic-practitioner opinion and not formal legal advice from Shenzhen Data Exchange. Several of the provincial regulations cited are implementing rules rather than national law; their application to a specific transaction will depend on jurisdiction and contract.

What “derivative data” means — and why the definition is contested

The starting point for any compliance analysis is whether a data product qualifies as derivative data (衍生数据) at all. Two official sources now provide partial answers, but neither fully closes the interpretive gap.

The National Data Administration (国家数据局) published its second batch of sector terminology definitions on 29 March 2025. Item 6 defines derivative data as: data produced by a data processor that holds use-rights to the underlying dataset, which, using professional knowledge through processing, modelling, and key-information extraction, achieves a substantial change (实质改变) in the content, form, or structure of the source data, thereby significantly increasing (显著提升) data value.

The authors highlight two unresolved tensions in that definition. First, the phrase “substantial change” requires a qualitative judgment that the official guidance does not operationalise — practitioners and academics have not yet agreed on where the threshold sits. Second, “significantly increasing” sets a higher bar than a mere “valuable” standard; incremental cleansing or de-identification alone is unlikely to meet it.

The national standard GB/T 43697-2024 (Data Security Technology — Data Classification and Grading Rules) takes a broader, more enumerated approach: derivative data is produced through statistical analysis, correlation, mining, aggregation, or de-identification processing. It explicitly classifies de-identified data, labelled data, statistical data, and fused data as subtypes of derivative data.

The authors draw out two practical open questions that compliance counsel should monitor: (1) whether the National Data Administration will publish further interpretive criteria analogous to the Ministry of Finance’s published list of seven circumstances in which data assets should not be recognised on balance sheets; and (2) whether the forthcoming unified data-property-rights registration rules will address how derivative data from different source channels is treated differently — a question already live in the public-data authorized-operation specifications.

The legislative landscape for public-data opening

China’s approach to public-data opening (公共数据开放) operates at two levels. At the national level, the December 2022 joint opinion of the CPC Central Committee and the State Council on building a data foundation system (the “Data Twenty Articles,” which DCC covers in the data foundation system opinions) established the principal framework: public data used for public-governance and public-welfare purposes should be made conditionally available free of charge; public data used for industrial and commercial development may be subject to a conditional paid-use model.

At the provincial and local level, a patchwork of management measures has followed. The authors survey rules from Shandong, Guangdong, Inner Mongolia, Shanghai, Chongqing, Zhejiang, Yunnan, and Anhui, each of which adds its own prohibitions and conditions on derivative use. The common thread is an encouraging posture toward commercial development of opened public data, paired with explicit prohibitions on a largely consistent set of conduct.

Five operational compliance points for developers of derivative products

The authors identify five areas where operators using opened public data need active compliance work.

1. Automated collection is not prohibited but carries its own legal exposure. Provincial rules permit opened public data to be obtained by download, API access, or algorithmic delivery of result data. None of the surveyed provincial measures expressly prohibit automated collection (爬取). However, operators remain subject to the DSL, the Network Data Security Management Regulations (网络数据安全管理条例), and the Criminal Law provisions that govern automated collection. In practice this means completing pre-collection security self-assessments, controlling access frequency to avoid causing service disruption to the data source, and never circumventing or breaking technical protective measures or exceeding authorised access scope.

2. Use must not damage rights or breach platform terms. A composite picture of prohibited conduct drawn from multiple provincial measures includes: using public data to obtain illegal benefits; abusing the rights obtained or harming national, public, or third-party interests; violating the terms of any data-use agreement; and failing to implement required security safeguards. The Chongqing measure adds a specific prohibition that is worth noting for its intelligence-law resonance: operators may not aggregate public data so as to produce information touching on state secrets, national security, or other important sensitive content.

3. Source-labelling is mandatory in some jurisdictions. Shandong and Yunnan provincial rules both require that any data product, research report, or academic paper derived from opened public data must identify the data source and the acquisition date. While this obligation currently applies in fewer than all jurisdictions, the trend is toward wider adoption, and the labelling requirement is easy to build into product design early.

4. Sensitive-data identification and security assessment after bulk collection. This obligation is currently uncommon but emerging. Chongqing expressly prohibits aggregating collected public data into information touching on national secrets or security. Anhui’s draft public data management measures go further: where aggregation or correlation analysis of public data could produce classified or sensitive data, both the data-opening party and the data-user must conduct a security assessment and implement corresponding security measures. The authors note that academic commentary on the risks of public-data aggregation is beginning to appear, signalling that regulators are likely to treat this as a priority area.

5. Rights-conflict analysis is unavoidable for complex products. In practice, derivative data products frequently encounter conflicts among individual personal-information rights, third-party commercial-secret rights, copyright interests, and public-interest considerations. The authors provide a worked scenario: a social-media platform prohibits third-party automated collection of its content, but some of that content consists of government information (政府信息) published on the platform and subject to the Government Information Disclosure Regulations (政府信息公开条例). A third party commissioned by government to collect and analyse that data is operating in a collision zone between the platform’s terms, the government’s disclosure obligations, and the derivative-data rights of the commissioned party. The authors’ framework for resolving this: first, assess whether the platform holds any legitimate legal interest in the data concerned; if it does, analyse the priority of competing interests; if it does not, analyse why not. Where public interest and commercial interest genuinely conflict, public interest should in principle prevail.

Property-rights registration — channel matters

One of the most practically significant points in the article concerns the interaction between the channel through which public data was obtained and the scope of property-rights registration available to a derivative-data producer.

Under current data-property-rights registration practice, public data products developed through a public-data authorized-operation (公共数据授权运营) arrangement can be registered only for use rights (使用权) and operational rights (经营权); the holder cannot register holding rights (持有权). By contrast, derivative data produced from unconditionally opened public data (i.e., freely available open data) can currently be registered for all three rights (三权) — holding, use, and operational.

This asymmetry has direct implications for investment value, securitisation, and dispute resolution. The authors flag that a unified national data-property-rights registration framework has not yet been published, and the question of whether derivative data can simultaneously hold both data-property-rights registration and data-intellectual-property registration — and whether that creates redundancy or genuine layered protection — remains open. The Datatang v. Yinmu data-IP registration case is a useful reference point for how courts and registration bodies are already navigating the boundary between these two tracks.

Four protection strategies for derivative-data rights holders

The article closes with four strategies for protecting derivative-data product rights against infringement — important context for companies concerned less about compliance risk and more about enforcing their own data assets.

Strategy 1 — Accurately characterise your product and your obligations. A product with some public-data attributes does not necessarily carry an obligation to make it freely available. Drawing on a recent Beijing internet court ruling (described only as “the GX v. WX case”), the authors note that a product with partial public-data character is not automatically a public-data product. Investment in developing a derivative product should attract Anti-Unfair Competition Law protection; the product owner cannot be required to tolerate scraping by competitors.

Strategy 2 — Use the new data-specific provision in the Anti-Unfair Competition Law. The Anti-Unfair Competition Law (2025 revision) added a “data-specific clause” (数据专条) within its internet chapter. Article 13(3) prohibits operators from obtaining or using data lawfully held by another operator through deception, coercion, circumventing or breaking technical protective measures, or other improper means, where doing so harms the other operator’s legitimate interests and disrupts market competition. The authors identify four elements that must be established: (i) a competitive relationship between the parties; (ii) acquisition or use of the other party’s lawfully-held data through improper means; (iii) the affected party holds a legitimate interest (including a competitive interest) in the data; and (iv) damage to that interest and disruption to market order.

Strategy 3 — Pursue trade-secret protection. Citing recent case law and the Criminal Law Amendment (XIII) tightening sanctions for trade-secret misappropriation, the authors suggest that a derivative-data rights holder should consider classifying its product as a trade secret — both for civil litigation purposes and as a deterrent to employee-facilitated data leakage — provided the operator implements appropriate technical and management controls to establish and maintain secrecy.

Strategy 4 — Explore data-rights infringement and contractual liability. Under the Data Twenty Articles, the Civil Code, and the DSL, data is a protected civil interest. Where a counterparty in a commercial arrangement misappropriates data-product rights, or where a third party infringes the data-product holder’s rights, tort and contractual liability are both available. The authors note that the Supreme People’s Court has recently issued guiding cases on data-rights protection, and the range of enforcement strategies is becoming more diverse — including arbitration as an alternative to litigation.

Why overseas counsel should care

The definition of “derivative data” is the gating question for every data-product transaction. Until the National Data Administration publishes clearer criteria, due diligence on a Chinese data-product acquisition must include a fact-specific analysis of whether the product genuinely satisfies “substantial transformation” and “significant value uplift” — and whether the source data was obtained through authorized-operation or unconditional-open channels, since that determines what property rights can be registered and traded.
Automated collection of opened public data is structurally risky even when not expressly prohibited. Foreign operators running data ingestion pipelines against Chinese public datasets need pre-deployment security self-assessments, rate controls, and, critically, an aggregation analysis: several provincial rules and the emerging national trend treat bulk aggregation as a trigger for sensitive-data assessment obligations, which can apply even where the individual source records are innocuous.
The Anti-Unfair Competition Law (2025) data clause is a new offensive and defensive tool. Article 13(3) is likely to generate litigation over the next two to three years as rights holders test it. For overseas companies whose Chinese partners or competitors are building derivative data products from public datasets, this provision — together with trade-secret doctrine — is the primary legal backstop if a product is misappropriated.
The holding-rights gap in authorized-operation products has deal-structure implications. Where a client’s Chinese data-product business is built on public-data authorized-operation contracts rather than freely-opened data, the inability to register holding rights constrains collateral value, affects how IP can be licensed, and could create complications in an M&A context. Structuring advice should account for this asymmetry now, before the unified registration rules are published and potentially lock in current practice.

DCC sources

Original: 王艺、余灏 (Wang Yi, Yu Hao), 《公共数据开放背景下衍生数据产品开发利用的法律挑战与合规要点》, 深圳数据交易所 DEXC+ 专栏 WeChat Official Account (source).
National Data Administration, 《数据领域常用名词解释（第二批）》(Second Batch of Common Terminology Definitions for the Data Sector), 29 March 2025.
GB/T 43697-2024, Data Security Technology — Data Classification and Grading Rules (数据安全技术数据分类分级规则), §3.10 and Annex I.
CPC Central Committee and State Council, Opinions on Building a Data Foundation System to Better Leverage the Role of Data as a Factor of Production (数据二十条), 2 December 2022.
Anti-Unfair Competition Law (反不正当竞争法) (2025 revision), Art. 13(3).
Public-data authorized-operation specifications.
Provincial public-data management measures cited: Shandong (2022), Guangdong (2021), Inner Mongolia (暂行办法), Shanghai (暂行办法), Chongqing (暂行办法), Zhejiang (条例), Yunnan (试行), Anhui (征求意见稿).

This is an editorial summary, not a translation of the original DEXC+ column article. The authors’ arguments and examples are attributed throughout; any simplification, emphasis, or operational extrapolation is DCC’s. The original article represents the academic and professional views of Wang Yi and Yu Hao personally, and does not represent the position of Shenzhen Data Exchange. Not legal advice.