How to read a datacard
Last updated
Last updated
Navigating through datacards on the Valyu Exchange is an essential skill for effectively utilising the platform. This guide aims to explain all elements of a datacard using the OpenAI Summarization Datacard as a practical example. Let's delve into the key components that make up a datacard.
At the top of the datacard, you will find the title, a distinctive image, and the datacard score. These elements provide a quick snapshot of the datacard:
Dataset name - IMPORTANT some datasets may have confusing/uninterpretable name. In this case, it is crucial to look at the description and the dataset characteristics to get a sense of what the dataset's purpose is.
Datacard score - as described in the datacard score section, the datacard score gives you an indication of the dataset's level of documentation/trustworthiness/freshness.
Datacard image - these funky gradients are computed based on the dataset name and a random seed.
The main section of the datacard is divided into several parts, each designed to provide a comprehensive understanding of the dataset's nature and applicability:
DID - the identifier at the top is a decentralised identifier for the dataset. What is a DID? A globally unique, cryptographically verifiable, resolvable, decentralised identifier. See here the link here for more detail: https://www.w3.org/TR/did-core/
Description - a brief description of what the dataset is, and its intended use-cases
Tasks - A list of possible tasks that the dataset is suited for, in the example above this dataset is suitable for fine-tuning models for summarisation tasks.
Languages - This languages that the dataset includes, can be a list.
The bottom of the datacard displays the names of the licenses that apply to the dataset. To get a more detailed description of what the individual licenses mean, what you can use the dataset for, and possible license conflicts, click on the datacard and navigate to the licenses page.
The observant among you may have noticed that different datacards have different coloured outlines to them for example the OpenAI summarise datacard has a green outline. This tells you the level of openness of the licenses for that dataset. Below is a key:
Green outline - OPEN license
Pink outline - CLOSED license
Blue outline - PROPRIETARY license, e.g. OPENAI
Yellow outline - UNSPECIFIED