Open data license

How the data is licensed

The one thing the Tendril Foundation makes is judged local-language data and evaluation sets. That work is meant to be a public good, so it is released openly, stamped with where it came from, and co-owned with the community that judged it.

The published sample

CC-BY-4.0

The sample dataset on this site is released under the Creative Commons Attribution 4.0 license. You are free to use, share, and build on it, including commercially, as long as you give appropriate credit to the Tendril network.

The principles

Open by default. Datasets, evaluation sets, and tooling the Foundation builds for a cause are released under an open license so anyone, including open-source AI labs and foundations, can use them.
Provenance-stamped. Each record carries where it came from and how it was produced (for example, opt-in contribution, PII-redacted before export), so it can be trusted and audited.
Co-owned. The community that speaks the language and does the judging is credited as a co-creator of the data made for its cause.
No personal data. Released data is redacted of personal information before it is published. The data is about language and judgment, not about individuals.

Per-cause terms

Larger datasets built with a specific partner community may carry their own open license and attribution terms, agreed with that community. Those terms are published alongside each dataset when it is released. Until a dataset exists, it is not claimed here.

Contact

Questions about licensing or reuse: [email protected].

← Back to tendril.foundation