How to navigate the exchange
Last updated
Last updated
Navigating a new interface can be daunting, so we're here to make the Valyu exchange easier to understand. Our goal is to break it down for you and provide practical tips on how to use it efficiently.
Here is what you will see when you first land on the exchange website:
This is the open data view of the exchange, where you can browse all of the thousands of open datasets we have available to find the ones specific for your use-cases, whether that be training/fine-tuning/RAG. Lets break down the key components:
Side-bar - Here you can filter the datasets base on your needs. Make use of the AI-powered search for filters to find exactly what you are after.
Top-bar:
Navigation - At the top left we have the navigation to other (currently in beta) sections of the exchange.
Search - This AI-powered search bar performs a search on all the datasets over the full datacard, so search by category/name/language/etc
Sign in - Click here to request beta-access to unreleased parts of the exchange.... click here
Datacards - All of the (filtered) datacards are displayed in the centre, for more details on how to make sense of these datacards click here Once you find a dataset you like, click on it to bring up the provenance view:
Once you have selected a dataset, you will be brought to the provenance dashboard, lets break down what you're seeing:
Provenance view - In the centre we have the provenance view. This shows the full lifecycle of the data. To the left of the dataset we have its origins:
Parent datasets (available on the exchange, or an identifier for datasets that aren't on the exchange)
Sources, e.g. wikipedia, reddit, ....
Models, where the data has been generated in part by a machine learning model
For full documentation on the datacards click here
To the right of the dataset we have its children:
Models trained on the dataset
Datasets derived from the current one
Side-bar - Here we have navigation to the characteristics, and license pages of the dataset. More on this below
Top-bar - The dataset name, and beside it is the datacard score. If you have found that the dataset is right for your use-case, then you can click on the download dropdown menu and download the dataset
So, you've verified the history of the dataset, you now want to taker a closer look at the data itself, navigate to characteristics:
The characteristics view gives you an overview of the topics included in the dataset, the languages used, text metrics (or video/audio/time series for other modalities), freshness, and more.
Lets break down what we are seeing:
Dataset Quality - This star diagram has the following properties:
Freshness
Documentation
Openness (with respect to licenses)
Downloads (hugging face)
Properties (currently undefined, please give us feedback on what you'd like to see)
Other metrics:
Description
Topics
Metrics
Metadata
Identifier
If this dataset is still ticking all the boxes, it is time to move to the licenses page:
Here we see all of the licenses that apply to the dataset. Each license has its own license card that gives an overview of:
What the license means you can do with the dataset
What the license prohibits you from doing
Tips on staying compliant
A summary of the license
It will also highlight any conflicts between license: