Editor’s Note — DCC.
This is a consolidated case study, not a translation of any single piece. 数据堂诉隐木科技 (Datatang v. Yinmu) is the most-cited Chinese data-market judgment of the past two years — popularly tagged “China’s first case on the evidentiary effect of a data-IP registration certificate” (全国首例涉数据知识产权登记证书效力案). DCC synthesizes four commentaries — the case report (法律与新经济 / 知产宝), a Tsinghua Institute for AI & Rule of Law analysis, and two Shenzhen Data Exchange DEXC+ deep-dives — into the holdings that matter for overseas counsel. The case sits at the intersection of three things DCC has covered separately: the Data 20 Articles three-rights framework, the data-property-rights registration regime, and open-source AI training-data compliance. Here a court actually decides how they interact.
The case
| Parties | 数据堂(北京)科技股份有限公司 (Datatang, plaintiff) v. 隐木(上海)科技有限公司 (Yinmu, defendant) |
| First instance | Beijing Internet Court — (2021)京0491民初45708号 |
| Appeal | Beijing IP Court — (2024)京73民终546号 (affirmed, June 28, 2024) |
| Cause of action | Unfair competition (不正当竞争纠纷) |
| Result | Yinmu pays Datatang ¥100,000 in economic loss + ¥2,300 in reasonable enforcement costs; appeal dismissed, first-instance judgment upheld |
The facts: Datatang is a data company that built voice datasets for AI model training — collecting and processing a substantial volume of voice-data entries through its own technical, capital, and labor investment. It open-sourced some of these datasets under a license. Yinmu, a competitor in the AI-training-data-source market, obtained the datasets and redistributed / used them in a way the courts found did not honor the terms on which the data was made available. Datatang sued for unfair competition. Crucially, Datatang held a Data-IP Registration Certificate (《数据知识产权登记证》) for the dataset.
The case is doctrinally important because the dataset fell into the gap the Chinese data-property debate keeps circling: it was public (so not a trade secret), it lacked originality in selection/arrangement (so not a copyrightable compilation), and “data” is not yet a typed civil property right in statute. So what, exactly, protects it?
Holding 1 — A data-IP registration certificate is prima facie evidence, not a property right
This is the headline. The Beijing IP Court held that Datatang’s Data-IP Registration Certificate can serve as prima facie evidence of two things:
- that Datatang holds property-type interests in the dataset; and
- that the collection conduct / data source was lawful.
Absent contrary evidence, the court could find those facts on the strength of the certificate. This is the first Chinese judgment to give a data-registration certificate concrete evidentiary force — which is why the “data registration” community treated it as a watershed.
But — and this is the nuance overseas counsel must hold onto — the certificate is not proof of an absolute property right. The Shenzhen Data Exchange DEXC+ analysis draws out the appeal court’s reasoning: under the property-rights-statutism principle (财产权法定原则), a property-type legal interest that has not been confirmed by statute as an absolute property right cannot be analogized to other absolute property rights for judicial protection. Civil Code Article 127 (“where laws provide for the protection of data … such provisions apply”) is, the court said, a referential / declaratory clause — it has not made “data” a typed civil right with defined content. The “data three rights” (hold / use / operate) from the Data 20 Articles remain policy-level and economic concepts, not statutory absolute rights, because under the Legislation Law the creation of basic civil rights is reserved to NPC statute — administrative regulations, departmental rules, local rules, and policy documents cannot create them.
So Datatang could not invoke Article 127 to demand that its dataset be treated as an absolute property right. The registration certificate shifts the burden of evidence; it does not conjure a property right. For overseas counsel: registering data in China (the data-IP pilots, the data-exchange registration certificates) is now genuinely worth doing for its evidentiary value — but do not mistake a certificate for title.
Holding 2 — Open-sourced data is still protected, via the Anti-Unfair Competition Law
If the dataset is not a property right, not a trade secret, and not a copyrightable work, what protects it? The court’s answer: the Anti-Unfair Competition Law (AUCL) general clause, Article 2.
The reasoning: even though the dataset was public (failing the trade-secret secrecy requirement) and lacked originality in selection/arrangement (failing the compilation-work requirement), Datatang had made substantial technical, capital, and labor investment to lawfully collect a substantial volume of voice-data entries, adding commercial value to the raw data, meeting AI-developers’ needs, and generating traffic, transaction opportunities, and competitive advantage. That commercial benefit is, in substance, a competitive interest (竞争性权益) — and competitive interests are legitimate interests the AUCL protects.
Holding 3 — The protection hierarchy
The Tsinghua analysis distills the court’s framework into a clean three-tier hierarchy for protecting a dataset — useful as an operating checklist:
- Public + original selection/arrangement → copyright (compilation work). If the dataset’s structure is original, protect it as a compilation work.
- Not easily obtainable by people in the field → trade secret. If the dataset is genuinely non-public, protect it as a trade secret.
- Public + no originality → Anti-Unfair Competition Law Article 2. If it’s public and unoriginal — the residual case, and the most common one for bulk training data — there is no IP exclusive right or trade-secret basis, so protection runs through the AUCL general clause, as appropriate.
Datatang’s voice dataset fell into tier 3 — which is exactly why this case matters: it confirms that the residual category of “public, unoriginal, substantial-investment” datasets is not unprotected. The AUCL general clause is the backstop.
Holding 4 — The open-source license is the hinge
The most operationally important holding for anyone building or using AI training data: open-sourcing data does not abandon rights in it. The court held that, absent the holder’s permission, no one may publicly disseminate a dataset that the holder lawfully collected through substantial investment. And when the holder does open-source the dataset, whether the acquirer follows the open-source license is an important factor in judging whether the use violates commercial ethics in the data-services field.
In other words: “it was open-sourced” is not a defense. The license terms travel with the data. A competitor who takes open-sourced data and uses or redistributes it outside the license is acting improperly — and the open-source license becomes the measure of commercial ethics under the AUCL.
The case also features a doctrinal breakthrough on the “substantial substitution” question. Chinese data-unfair-competition cases have often asked whether the defendant’s product substantially substitutes for the plaintiff’s (a market-harm element). Here the court reasoned at a higher level of generality: if data obtained from open-source channels could be freely re-shared with third parties for free, that would impair data circulation, hinder data innovation, and obstruct the construction of the unified national data market — and is therefore improper regardless of whether classic market substitution is shown. The court tied the impropriety analysis directly to the national data-market-building policy.
The registration-system context
The Shenzhen Data Exchange DEXC+ pieces situate the case in the fast-growing data-registration landscape overseas counsel should know:
- The Data 20 Articles (December 2022) introduced the three-rights structural-separation framework; localities then began experimenting with “three-rights” registration.
- Since 2022 the National Intellectual Property Administration has run data-IP pilots in 8 localities (Beijing, Shanghai, Jiangsu, Zhejiang, and others), adding 9 more in 2024. Across the pilots, 2,000+ data-IP registration certificates have been issued, supporting ¥1.1 billion+ in pledge financing.
- Registration objects generally must be lawfully obtained, processed by some rule or algorithm, and possess commercial value and intellectual-achievement attributes.
- DCC’s caution, reinforced by the DEXC+ analysis: registration ≠ rights-confirmation (确权). Registration records and provides evidence; it does not, on current law, create a property right. (See DCC’s brief on what data registration actually confirms.)
What this tells overseas compliance teams
-
Register your data in China for evidentiary value — but don’t treat a certificate as title. A data-IP registration certificate (or a data-exchange registration certificate) now carries real prima-facie weight on both property-type interest and lawful sourcing. That shifts the burden to a challenger. But it is not an absolute property right, and a Chinese court will say so — your substantive protection still runs through the AUCL (or trade secret / copyright where those fit).
-
Treat the AUCL general clause as the real protector of bulk datasets. For the common case — public, unoriginal, substantial-investment datasets (most training corpora) — neither copyright nor trade secret applies. AUCL Article 2 is the backstop. Build your data-misappropriation claims (and your defensive posture) around competitive-interest and commercial-ethics reasoning, not around a claimed property right.
-
Open-source ≠ free-for-all. The license travels with the data. This is the single most important operational takeaway for AI builders. If you ingest open-sourced Chinese datasets, honor the open-source license — the court treats license compliance as the measure of commercial ethics, and using open data outside its license is improper conduct, even without classic market substitution. Conversely, if you open-source your own data, you retain an AUCL-backed claim against those who use it outside the license. (Pair this with Zhang Ping’s open-source training-data analysis: “open-source does not mean open data.”)
-
Document substantial investment. The court’s protection of Datatang turned on its demonstrated technical/capital/labor investment in lawfully collecting and adding value to the data. Maintain provenance, collection-method, and investment documentation for any dataset you may need to defend — it is the factual core of an AUCL competitive-interest claim. (This is the same documentation logic that runs through the data-source-rights debate and Tang Linyao’s data-broker analysis.)
-
The national-data-market policy is now a litigation argument. The court framed impropriety partly in terms of building the unified national data market. Expect Chinese courts to keep reading the data-element-market policy goals into AUCL analysis — which cuts both ways: hoarding/blocking and free-riding can each be cast as market-impairing depending on the facts.
The deeper significance: Datatang v. Yinmu is the case where the abstract Chinese data-property architecture — three rights, registration, the unified market — met an actual commercial dispute and produced operating doctrine. The synthesis it leaves: in China you register data for evidence, protect it through unfair-competition law, and the open-source license is the line between legitimate reuse and misappropriation. For overseas counsel structuring AI-data sourcing or data-trading arrangements touching China, that three-part rule is the practical state of the law.
Sources (consolidated):
- 【案例速递】开源数据亦可受到反法保护,扰乱数据服务市场的行为具有不当性 — 数据堂诉隐木公司AI训练数据源案, 法律与新经济 (case report, via 知产宝). Link.
- 首例涉数据知识产权登记效力案,处于公开状态的数据不属于商业秘密,但可依据反法保护, 清华大学智能法治研究院 (Tsinghua University Institute for AI & Rule of Law). Link.
- DEXC+专栏 | 证据法视角中的数据产权登记——兼论我国数据产权登记制度的构建, 深圳数据交易所 DEXC+. Link.
- DEXC+专栏 | 数据产权登记,思路打开,合规先行——从”全国首例涉数据知识产权登记证书效力案”说起, 深圳数据交易所 DEXC+. Link.
Not legal advice. The above is DCC’s consolidated structured summary of a public judgment and four commentaries, with framing for overseas counsel; the holdings, the protection hierarchy, and the property-rights-statutism reasoning are the court’s and the commentators’.