Overview

Valyu can be used as a custom retriever in LlamaIndex to enhance your RAG applications with high-quality context from both proprietary datasets and web sources.

Installation

First, install both the Valyu SDK and LlamaIndex:

pip install valyu llama-index

Usage

Basic Integration

Here’s a simple example of using Valyu as a retriever in LlamaIndex:

from valyu import Valyu
from llama_index.core import VectorStoreIndex
from llama_index.core.retrievers import BaseRetriever
from typing import List

class ValyuRetriever(BaseRetriever):
    def __init__(self, valyu_client: Valyu):
        self.valyu = valyu_client
        
    def _retrieve(self, query: str) -> List:
        # Get context from Valyu
        response = self.valyu.context(
            query=query,
            search_type="all",
            num_results=5
        )
        
        # Convert to LlamaIndex nodes
        nodes = []
        for result in response.results:
            node = Document(
                text=result.content,
                metadata={
                    "source": result.url,
                    "score": result.relevance_score
                }
            )
            nodes.append(node)
            
        return nodes

# Initialize Valyu and the custom retriever
valyu = Valyu()
retriever = ValyuRetriever(valyu)

# Use in your LlamaIndex pipeline
response = retriever.retrieve("What is quantum computing?")

Advanced Usage

Query Engine Integration

You can use the Valyu retriever with LlamaIndex’s query engine for more advanced RAG applications:

from llama_index.core import Settings
from llama_index.llms import OpenAI

# Configure LlamaIndex
Settings.llm = OpenAI(model="gpt-4")
Settings.chunk_size = 512

# Create a query engine with Valyu retriever
query_engine = RetrieverQueryEngine.from_args(
    retriever=ValyuRetriever(valyu),
    response_synthesizer=get_response_synthesizer(
        response_mode="compact"
    )
)

# Query with context from Valyu
response = query_engine.query(
    "Explain the significance of quantum entanglement"
)
print(response)

Customizing Results

You can customize how Valyu retrieves and processes context:

class CustomValyuRetriever(ValyuRetriever):
    def __init__(
        self, 
        valyu_client: Valyu,
        search_type: str = "all",
        num_results: int = 5,
        min_score: float = 0.7
    ):
        super().__init__(valyu_client)
        self.search_type = search_type
        self.num_results = num_results
        self.min_score = min_score
        
    def _retrieve(self, query: str) -> List:
        response = self.valyu.context(
            query=query,
            search_type=self.search_type,
            num_results=self.num_results
        )
        
        # Filter by relevance score
        filtered_results = [
            result for result in response.results 
            if result.relevance_score >= self.min_score
        ]
        
        return [
            Document(
                text=result.content,
                metadata={
                    "source": result.url,
                    "score": result.relevance_score,
                    "source_type": result.source_type
                }
            ) 
            for result in filtered_results
        ]

# Use the custom retriever
retriever = CustomValyuRetriever(
    valyu,
    search_type="proprietary",
    num_results=10,
    min_score=0.8
)

Best Practices

  1. Cache Results: Consider implementing caching for frequently used queries to reduce API calls.

  2. Error Handling: Always implement proper error handling:

from valyu.exceptions import ValyuAPIError

try:
    response = retriever.retrieve("Your query")
except ValyuAPIError as e:
    print(f"Valyu API error: {e}")
    # Fallback to alternative retrieval method
  1. Cost Management: Monitor your usage and implement cost controls:
retriever = CustomValyuRetriever(
    valyu,
    num_results=5,
    max_price=5  # Maximum price in credits per query
)

Example Applications

Research Assistant

from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex

# Combine local documents with Valyu context
documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)

# Create hybrid retriever
hybrid_retriever = CombineRetriever([
    index.as_retriever(similarity_top_k=2),
    ValyuRetriever(valyu)
])

# Create query engine
query_engine = RetrieverQueryEngine.from_args(
    retriever=hybrid_retriever,
    response_synthesizer=get_response_synthesizer(
        response_mode="tree_summarize"
    )
)

# Query with both local and Valyu context
response = query_engine.query(
    "Summarize the latest developments in quantum computing"
)

Additional Resources