Overview

Valyu integrates seamlessly with LlamaIndex as a search tool, allowing you to enhance your AI agents and RAG applications with real-time web search and proprietary data sources. The integration provides LLM-ready context from multiple sources including web pages, academic journals, financial data, and more.

Installation

Install the official LlamaIndex Valyu package:

pip install llama-index-tools-valyu

You’ll also need to set your Valyu API key as an environment variable:

export VALYU_API_KEY="your-api-key-here"

Free Credits

Get your API key with $10 credit from the Valyu Platform.

Basic Usage

Using ValyuToolSpec Directly

import os
from llama_index.tools.valyu import ValyuToolSpec

# Set your API key
os.environ["VALYU_API_KEY"] = "your-api-key-here"

# Initialize the tool
valyu_tool = ValyuToolSpec(
    api_key=os.environ["VALYU_API_KEY"],
    verbose=True,
    max_price=20.0
)

# Perform a search
search_results = valyu_tool.context(
    query="What are agentic search-enhanced large reasoning models?",
    search_type="all",  # "all", "web", or "proprietary"
    max_num_results=5,
    max_price=20.0,
    relevance_threshold=0.5
)

print("Search Results:", search_results)

Using with LlamaIndex Agents

The most powerful way to use Valyu is within LlamaIndex agents, where the AI can dynamically decide when and how to search:

import os
import asyncio
from llama_index.tools.valyu import ValyuToolSpec
from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.llms.openai import OpenAI

# Set API keys
os.environ["VALYU_API_KEY"] = "your-valyu-api-key"
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

# Initialize components
llm = OpenAI(model="gpt-4o-mini")
valyu_tool_spec = ValyuToolSpec(
    api_key=os.environ["VALYU_API_KEY"],
    verbose=True,
    max_price=25.0
)

# Create agent workflow with Valyu search capability
agent = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=valyu_tool_spec.to_tool_list(),
    llm=llm,
    system_prompt="You are a helpful research assistant with access to real-time web search and academic databases through Valyu."
)

# Use the agent
async def main():
    response = await agent.run(
        user_msg="What are the key factors driving recent stock market volatility, "
                "and how do macroeconomic indicators influence equity prices across different sectors?"
    )
    print(response)

# Run the async function
asyncio.run(main())

Advanced Configuration

Search Parameters

The ValyuToolSpec supports all v2 API parameters for fine-tuned control:

from llama_index.tools.valyu import ValyuToolSpec

# Initialize with custom settings
valyu_tool = ValyuToolSpec(
    api_key="your-api-key",
    verbose=True,
    max_price=30.0  # Default maximum price per search
)

# Advanced search with specific parameters
results = valyu_tool.context(
    query="quantum computing breakthroughs 2024",
    search_type="proprietary",  # Focus on academic sources
    max_num_results=10,
    max_price=25.0,  # Override default for this search
    relevance_threshold=0.7,  # Higher relevance threshold
    is_tool_call=True,  # Optimized for agent usage
    start_date="2024-01-01",  # Time-filtered search
    end_date="2024-12-31"
)

Multi-Agent Workflows

Use Valyu in complex multi-agent systems with the new AgentWorkflow:

import asyncio
from llama_index.tools.valyu import ValyuToolSpec
from llama_index.core.agent.workflow import AgentWorkflow, FunctionAgent
from llama_index.llms.openai import OpenAI

# Create specialized research agent
research_llm = OpenAI(model="gpt-4o-mini", temperature=0.1)
research_tool = ValyuToolSpec(
    api_key=os.environ["VALYU_API_KEY"],
    max_price=20.0
)

research_agent = FunctionAgent(
    name="ResearchAgent",
    description="Specialist in finding and analyzing academic and scientific sources",
    tools=research_tool.to_tool_list(),
    llm=research_llm,
    system_prompt="You are a research specialist. Use Valyu to find authoritative sources and provide well-cited answers."
)

# Create analysis agent
analysis_llm = OpenAI(model="gpt-4o-mini", temperature=0.3)
analysis_tool = ValyuToolSpec(
    api_key=os.environ["VALYU_API_KEY"],
    max_price=30.0
)

analysis_agent = FunctionAgent(
    name="AnalysisAgent",
    description="Specialist in analyzing data and providing insights",
    tools=analysis_tool.to_tool_list(),
    llm=analysis_llm,
    system_prompt="You are an analyst. Use current data to provide insights and recommendations."
)

# Create multi-agent workflow
workflow = AgentWorkflow(agents=[research_agent, analysis_agent])

# Coordinate agents for complex queries
async def main():
    research_response = await research_agent.run(
        user_msg="Find recent papers on transformer architecture improvements"
    )
    analysis_response = await analysis_agent.run(
        user_msg="Analyze market trends in AI chip demand"
    )

    print("Research Results:", research_response)
    print("Analysis Results:", analysis_response)

asyncio.run(main())

Example Applications

Financial Research Assistant

import asyncio
from llama_index.tools.valyu import ValyuToolSpec
from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.llms.openai import OpenAI

# Create financial research agent
financial_llm = OpenAI(model="gpt-4o-mini")
valyu_tool = ValyuToolSpec(
    api_key=os.environ["VALYU_API_KEY"],
    max_price=25.0
)

financial_agent = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=valyu_tool.to_tool_list(),
    llm=financial_llm,
    system_prompt="""You are a financial research assistant. Use Valyu to search for:
    - Real-time market data and news
    - Academic research on financial models
    - Economic indicators and analysis

    Always cite your sources and provide context about data recency."""
)

# Query financial markets
async def main():
    response = await financial_agent.run(
        user_msg="What are the latest developments in cryptocurrency regulation "
                "and their impact on institutional adoption?"
    )
    print(response)

asyncio.run(main())

Academic Research Agent

from llama_index.tools.valyu import ValyuToolSpec

# Configure for academic research
academic_tool = ValyuToolSpec(
    api_key=os.environ["VALYU_API_KEY"],
    max_price=20.0
)

# Search academic sources specifically
academic_results = academic_tool.context(
    query="CRISPR gene editing safety protocols",
    search_type="proprietary",  # Focus on academic datasets
    max_num_results=8,
    relevance_threshold=0.6,
)

print("Academic Sources Found:", len(academic_results))

Best Practices

1. Cost Optimization

# Set appropriate price limits based on use case
valyu_tool = ValyuToolSpec(
    api_key="your-api-key",
    max_price=15.0  # Conservative default
)

# For quick lookups
quick_search = valyu_tool.context(
    query="current bitcoin price",
    max_price=5.0,  # Lower cost for simple queries
    max_num_results=3,
    search_type="web"
)

# For comprehensive research
detailed_search = valyu_tool.context(
    query="comprehensive analysis of renewable energy trends",
    max_price=40.0,  # Higher budget for complex queries
    max_num_results=15,
    search_type="all"
)

2. Search Type Selection

# Web search for current events
web_results = valyu_tool.context(
    query="latest AI policy developments",
    search_type="web",
    max_num_results=5
)

# Proprietary search for academic research
academic_results = valyu_tool.context(
    query="machine learning interpretability methods",
    search_type="proprietary",
    max_num_results=8
)

# Combined search for comprehensive coverage
all_results = valyu_tool.context(
    query="climate change economic impact",
    search_type="all",
    max_num_results=10
)

3. Error Handling and Fallbacks

from llama_index.tools.valyu import ValyuToolSpec

def robust_search(query: str, fallback_query: str = None):
    tool = ValyuToolSpec(
        api_key=os.environ["VALYU_API_KEY"],
        max_price=20.0
    )

    try:
        # Primary search
        results = tool.context(
            query=query,
            max_price=20.0,
            max_num_results=5
        )
        return results
    except Exception as e:
        print(f"Primary search failed: {e}")

        if fallback_query:
            try:
                # Fallback with simpler query
                results = tool.context(
                    query=fallback_query,
                    max_price=10.0,
                    max_num_results=3,
                    search_type="web"
                )
                return results
            except Exception as e2:
                print(f"Fallback search also failed: {e2}")
                return []

        return []

# Usage
results = robust_search(
    "complex quantum entanglement applications",
    "quantum entanglement basics"
)

4. Agent System Messages

from llama_index.core.agent.workflow import AgentWorkflow

# Optimize agent behavior with good system messages
system_message = """You are an AI research assistant with access to Valyu search.

SEARCH GUIDELINES:
- Use search_type="proprietary" for academic/scientific queries
- Use search_type="web" for current events and news
- Use search_type="all" for comprehensive research
- Set higher relevance_threshold (0.6+) for precise results
- Use async/await patterns with AgentWorkflow for better performance
- Always cite sources from search results

RESPONSE FORMAT:
- Provide direct answers based on search results
- Include source citations with URLs when available
- Mention publication dates for time-sensitive information
- Indicate if information might be outdated"""

agent = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=valyu_tool.to_tool_list(),
    llm=llm,
    system_prompt=system_message
)

Integration with Other LlamaIndex Components

Custom Query Engines

from llama_index.core.query_engine import CustomQueryEngine
from llama_index.tools.valyu import ValyuToolSpec
from llama_index.core.schema import QueryBundle

class ValyuQueryEngine(CustomQueryEngine):
    def __init__(self, valyu_tool: ValyuToolSpec):
        self.valyu_tool = valyu_tool

    def custom_query(self, query_bundle: QueryBundle):
        results = self.valyu_tool.context(
            query=query_bundle.query_str,
            search_type="all",
            max_num_results=5
        )

        # Process results into response
        response_text = "\n\n".join([
            f"**{doc.metadata.get('title', 'Source')}**\n{doc.text}"
            for doc in results
        ])

        return response_text

# Use custom query engine
valyu_tool = ValyuToolSpec(api_key=os.environ["VALYU_API_KEY"])
query_engine = ValyuQueryEngine(valyu_tool)

response = query_engine.query("What is LlamaIndex?")
print(response)

Integration with Retrievers

from llama_index.core.retrievers import BaseRetriever
from llama_index.tools.valyu import ValyuToolSpec
from llama_index.core.schema import NodeWithScore, QueryBundle

class ValyuRetriever(BaseRetriever):
    def __init__(self, valyu_tool: ValyuToolSpec, search_type="all", max_results=5):
        self.valyu_tool = valyu_tool
        self.search_type = search_type
        self.max_results = max_results

    def _retrieve(self, query_bundle: QueryBundle):
        results = self.valyu_tool.context(
            query=query_bundle.query_str,
            search_type=self.search_type,
            max_num_results=self.max_results
        )

        # Convert to NodeWithScore objects
        nodes = []
        for doc in results:
            node = NodeWithScore(
                node=doc,
                score=doc.metadata.get('relevance_score', 0.5)
            )
            nodes.append(node)

        return nodes

# Use custom retriever
valyu_tool = ValyuToolSpec(api_key=os.environ["VALYU_API_KEY"])
retriever = ValyuRetriever(valyu_tool, search_type="proprietary", max_results=8)

nodes = retriever.retrieve("machine learning safety")
print(f"Retrieved {len(nodes)} nodes")

API Reference

For complete parameter documentation, see the Valyu API Reference.

Key Parameters

  • query (required): Natural language search query
  • search_type: "all", "web", or "proprietary" (default: "all")
  • max_num_results: 1-20 results (default: 5)
  • max_price: Maximum cost in dollars per thousand retrievals (default: 20.0)
  • relevance_threshold: 0.0-1.0 relevance threshold (default: 0.5)
  • is_tool_call: Optimize for agent usage (default: true)
  • start_date/end_date: Time filtering (YYYY-MM-DD format, optional)

Additional Resources