---
title: "Datatang v. Yinmu — China's First Ruling on a Data-IP Registration Certificate, and Why Open-Sourced Data Is Still Protected"
author: "DCC Editorial"
published: 2026-05-29T01:00:00.000Z
url: https://datacompliancechina.com/posts/datatang-v-yinmu-data-ip-registration-case/
description: "A consolidated case study of 数据堂诉隐木科技 (Datatang v. Yinmu) — the Beijing IP Court's June 2024 appeal ruling, widely called China's first case on the evidentiary effect of a data-IP registration certificate. The dispute: Datatang built voice datasets for AI training, open-sourced some under a license; Yinmu took and redistributed them in the same data-services market. DCC synthesizes four commentaries (the case report, a Tsinghua analysis, and two Shenzhen Data Exchange DEXC+ deep-dives) into the four holdings that matter for overseas counsel: (1) a data-IP registration certificate is prima facie evidence of property-type interests and lawful sourcing — but not an absolute property right (property-rights-statutism); (2) open-sourced data, though neither trade secret nor copyrightable compilation, is protectable under the Anti-Unfair Competition Law's general clause; (3) the protection hierarchy (compilation work → trade secret → AUCL Art. 2); and (4) whether the taker honored the open-source license is the hinge for 'improper conduct.'"
tags: ["judicial", "data-property-rights", "data-registration", "anti-unfair-competition", "ai-training-data", "open-source", "case"]
laws_cited: ["data-foundation-system-opinions", "data-property-rights-registration-guide-draft", "dsl"]
domains: ["data-economy", "data-security"]
account: "shenzhen-data-exchange"
original_title: "数据堂诉隐木公司 AI 训练数据源案 — 全国首例涉数据知识产权登记证书效力案 (consolidated)"
original_author: "Beijing IP Court (2024)京73民终546号; commentary by 法律与新经济, 清华大学智能法治研究院, 深圳数据交易所 DEXC+"
original_publication: "Multiple — see sources below"
original_url: "https://mp.weixin.qq.com/s/RRsiqVpVcL6eXG077JCjvQ"
source_language: "zh"
---
> *Editor's Note — DCC.*
>
> This is a consolidated case study, not a translation of any single
> piece. 数据堂诉隐木科技 (Datatang v. Yinmu) is the most-cited Chinese
> data-market judgment of the past two years — popularly tagged "China's
> first case on the evidentiary effect of a data-IP registration
> certificate" (全国首例涉数据知识产权登记证书效力案). DCC synthesizes
> four commentaries — the case report (法律与新经济 / 知产宝), a Tsinghua
> Institute for AI & Rule of Law analysis, and two Shenzhen Data Exchange
> DEXC+ deep-dives — into the holdings that matter for overseas counsel.
> The case sits at the intersection of three things DCC has covered
> separately: the [Data 20 Articles three-rights framework](/posts/nda-three-rights-structural-separation/),
> the [data-property-rights registration regime](/laws/data-property-rights-registration-guide-draft/),
> and [open-source AI training-data compliance](/posts/open-source-ai-training-data-compliance/).
> Here a court actually decides how they interact.

## The case

| | |
|---|---|
| **Parties** | 数据堂(北京)科技股份有限公司 (Datatang, plaintiff) v. 隐木(上海)科技有限公司 (Yinmu, defendant) |
| **First instance** | Beijing Internet Court — (2021)京0491民初45708号 |
| **Appeal** | Beijing IP Court — (2024)京73民终546号 (affirmed, June 28, 2024) |
| **Cause of action** | Unfair competition (不正当竞争纠纷) |
| **Result** | Yinmu pays Datatang ¥100,000 in economic loss + ¥2,300 in reasonable enforcement costs; appeal dismissed, first-instance judgment upheld |

The facts: Datatang is a data company that built **voice datasets for AI model training** — collecting and processing a substantial volume of voice-data entries through its own technical, capital, and labor investment. It **open-sourced** some of these datasets under a license. Yinmu, a competitor in the AI-training-data-source market, **obtained the datasets and redistributed / used them** in a way the courts found did not honor the terms on which the data was made available. Datatang sued for unfair competition. Crucially, Datatang held a **Data-IP Registration Certificate (《数据知识产权登记证》)** for the dataset.

The case is doctrinally important because the dataset fell into the gap the Chinese data-property debate keeps circling: it was *public* (so not a trade secret), it lacked originality in selection/arrangement (so not a copyrightable compilation), and "data" is not yet a typed civil property right in statute. So what, exactly, protects it?

## Holding 1 — A data-IP registration certificate is prima facie evidence, not a property right

This is the headline. The Beijing IP Court held that Datatang's **Data-IP Registration Certificate can serve as prima facie evidence** of two things:

- that Datatang **holds property-type interests** in the dataset; and
- that the **collection conduct / data source was lawful**.

Absent contrary evidence, the court could find those facts on the strength of the certificate. This is the first Chinese judgment to give a data-registration certificate concrete evidentiary force — which is why the "data registration" community treated it as a watershed.

**But — and this is the nuance overseas counsel must hold onto — the certificate is *not* proof of an absolute property right.** The Shenzhen Data Exchange DEXC+ analysis draws out the appeal court's reasoning: under the **property-rights-statutism principle (财产权法定原则)**, a property-type legal interest that has *not* been confirmed by statute as an absolute property right cannot be analogized to other absolute property rights for judicial protection. Civil Code Article 127 ("where laws provide for the protection of data … such provisions apply") is, the court said, a **referential / declaratory clause** — it has *not* made "data" a typed civil right with defined content. The "data three rights" (hold / use / operate) from the Data 20 Articles remain **policy-level and economic concepts**, not statutory absolute rights, because under the Legislation Law the creation of basic civil rights is reserved to NPC statute — administrative regulations, departmental rules, local rules, and policy documents cannot create them.

So Datatang **could not** invoke Article 127 to demand that its dataset be treated as an absolute property right. The registration certificate shifts the **burden of evidence**; it does not conjure a **property right**. For overseas counsel: registering data in China (the data-IP pilots, the data-exchange registration certificates) is now genuinely worth doing for its evidentiary value — but do not mistake a certificate for title.

## Holding 2 — Open-sourced data is still protected, via the Anti-Unfair Competition Law

If the dataset is not a property right, not a trade secret, and not a copyrightable work, what protects it? The court's answer: the **Anti-Unfair Competition Law (AUCL) general clause, Article 2**.

The reasoning: even though the dataset was public (failing the trade-secret secrecy requirement) and lacked originality in selection/arrangement (failing the compilation-work requirement), Datatang had made **substantial technical, capital, and labor investment** to lawfully collect a substantial volume of voice-data entries, adding commercial value to the raw data, meeting AI-developers' needs, and generating traffic, transaction opportunities, and competitive advantage. That commercial benefit is, in substance, a **competitive interest (竞争性权益)** — and competitive interests are legitimate interests the AUCL protects.

## Holding 3 — The protection hierarchy

The Tsinghua analysis distills the court's framework into a clean **three-tier hierarchy** for protecting a dataset — useful as an operating checklist:

1. **Public + original selection/arrangement → copyright (compilation work).** If the dataset's structure is original, protect it as a compilation work.
2. **Not easily obtainable by people in the field → trade secret.** If the dataset is genuinely non-public, protect it as a trade secret.
3. **Public + no originality → Anti-Unfair Competition Law Article 2.** If it's public and unoriginal — the residual case, and the most common one for bulk training data — there is no IP exclusive right or trade-secret basis, so protection runs through the AUCL general clause, as appropriate.

Datatang's voice dataset fell into tier 3 — which is exactly why this case matters: it confirms that the **residual category of "public, unoriginal, substantial-investment" datasets is not unprotected**. The AUCL general clause is the backstop.

## Holding 4 — The open-source license is the hinge

The most operationally important holding for anyone building or using AI training data: **open-sourcing data does not abandon rights in it.** The court held that, absent the holder's permission, no one may publicly disseminate a dataset that the holder lawfully collected through substantial investment. And when the holder *does* open-source the dataset, **whether the acquirer follows the open-source license** is an important factor in judging whether the use violates commercial ethics in the data-services field.

In other words: "it was open-sourced" is not a defense. The license terms travel with the data. A competitor who takes open-sourced data and uses or redistributes it *outside the license* is acting improperly — and the open-source license becomes the measure of commercial ethics under the AUCL.

The case also features a **doctrinal breakthrough on the "substantial substitution" question.** Chinese data-unfair-competition cases have often asked whether the defendant's product *substantially substitutes* for the plaintiff's (a market-harm element). Here the court reasoned at a higher level of generality: if data obtained from open-source channels could be freely re-shared with third parties for free, that would **impair data circulation, hinder data innovation, and obstruct the construction of the unified national data market** — and is therefore improper *regardless* of whether classic market substitution is shown. The court tied the impropriety analysis directly to the national data-market-building policy.

## The registration-system context

The Shenzhen Data Exchange DEXC+ pieces situate the case in the fast-growing data-registration landscape overseas counsel should know:

- The **Data 20 Articles** (December 2022) introduced the three-rights structural-separation framework; localities then began experimenting with "three-rights" registration.
- Since 2022 the **National Intellectual Property Administration** has run **data-IP pilots** in 8 localities (Beijing, Shanghai, Jiangsu, Zhejiang, and others), adding 9 more in 2024. Across the pilots, **2,000+ data-IP registration certificates** have been issued, supporting **¥1.1 billion+ in pledge financing**.
- Registration objects generally must be **lawfully obtained, processed by some rule or algorithm, and possess commercial value and intellectual-achievement attributes.**
- DCC's caution, reinforced by the DEXC+ analysis: registration ≠ rights-confirmation (确权). Registration records and provides evidence; it does not, on current law, create a property right. (See [DCC's brief on what data registration actually confirms](/posts/qinglan-what-data-registration-actually-confirms/).)

## What this tells overseas compliance teams

- **Register your data in China for evidentiary value — but don't treat a certificate as title.** A data-IP registration certificate (or a data-exchange registration certificate) now carries real prima-facie weight on both *property-type interest* and *lawful sourcing*. That shifts the burden to a challenger. But it is not an absolute property right, and a Chinese court will say so — your substantive protection still runs through the AUCL (or trade secret / copyright where those fit).

- **Treat the AUCL general clause as the real protector of bulk datasets.** For the common case — public, unoriginal, substantial-investment datasets (most training corpora) — neither copyright nor trade secret applies. AUCL Article 2 is the backstop. Build your data-misappropriation claims (and your defensive posture) around competitive-interest and commercial-ethics reasoning, not around a claimed property right.

- **Open-source ≠ free-for-all. The license travels with the data.** This is the single most important operational takeaway for AI builders. If you ingest open-sourced Chinese datasets, **honor the open-source license** — the court treats license compliance as the measure of commercial ethics, and using open data outside its license is improper conduct, even without classic market substitution. Conversely, if you open-source your own data, you retain an AUCL-backed claim against those who use it outside the license. (Pair this with [Zhang Ping's open-source training-data analysis](/posts/open-source-ai-training-data-compliance/): "open-source does not mean open data.")

- **Document substantial investment.** The court's protection of Datatang turned on its demonstrated technical/capital/labor investment in lawfully collecting and adding value to the data. Maintain provenance, collection-method, and investment documentation for any dataset you may need to defend — it is the factual core of an AUCL competitive-interest claim. (This is the same documentation logic that runs through [the data-source-rights debate](/posts/wang-nian-data-source-rights-as-fair-use/) and [Tang Linyao's data-broker analysis](/posts/tang-linyao-data-broker-derivative-harms/).)

- **The national-data-market policy is now a litigation argument.** The court framed impropriety partly in terms of *building the unified national data market*. Expect Chinese courts to keep reading the data-element-market policy goals into AUCL analysis — which cuts both ways: hoarding/blocking and free-riding can each be cast as market-impairing depending on the facts.

The deeper significance: Datatang v. Yinmu is the case where the abstract Chinese data-property architecture — three rights, registration, the unified market — met an actual commercial dispute and produced operating doctrine. The synthesis it leaves: **in China you register data for evidence, protect it through unfair-competition law, and the open-source license is the line between legitimate reuse and misappropriation.** For overseas counsel structuring AI-data sourcing or data-trading arrangements touching China, that three-part rule is the practical state of the law.

---

**Sources (consolidated):**

- *【案例速递】开源数据亦可受到反法保护，扰乱数据服务市场的行为具有不当性 — 数据堂诉隐木公司AI训练数据源案*, 法律与新经济 (case report, via 知产宝). [Link.](https://mp.weixin.qq.com/s/RRsiqVpVcL6eXG077JCjvQ)
- *首例涉数据知识产权登记效力案，处于公开状态的数据不属于商业秘密，但可依据反法保护*, 清华大学智能法治研究院 (Tsinghua University Institute for AI & Rule of Law). [Link.](https://mp.weixin.qq.com/s/VKoeCVplU639bjJDX-qIug)
- *DEXC+专栏 | 证据法视角中的数据产权登记——兼论我国数据产权登记制度的构建*, 深圳数据交易所 DEXC+. [Link.](https://mp.weixin.qq.com/s/BiA_J_aH7UMpO0V5f_usaA)
- *DEXC+专栏 | 数据产权登记，思路打开，合规先行——从"全国首例涉数据知识产权登记证书效力案"说起*, 深圳数据交易所 DEXC+. [Link.](https://mp.weixin.qq.com/s/2dVHs3L1I6NJ2eyBkhrkAg)

*Not legal advice. The above is DCC's consolidated structured summary of a public judgment and four commentaries, with framing for overseas counsel; the holdings, the protection hierarchy, and the property-rights-statutism reasoning are the court's and the commentators'.*
