This page offers guidance on how to integrate and use the Valyu DeepSearch API effectively to get the best results.

Building Agentic Search Workflows

Valyu delivers optimal performance when integrated into agentic search workflows rather than single-shot queries. The API is engineered for precision-driven searches where your AI system can precisely search what knowledge it needs. Recommended Architecture:
# Multi-step agentic search workflow
async def research_agent(query: str):
    # Step 1: Break down complex query into focused searches
    sub_queries = decompose_query(query)

    results = {}
    for i, sub_query in enumerate(sub_queries):
        # Step 2: Adapt search strategy based on previous findings
        strategy = adapt_strategy(sub_query, results)

        search_result = valyu.search(
            query=sub_query,
            included_sources=strategy.sources,
            max_price=strategy.budget,
            relevance_threshold=0.65
        )
        results[f"step_{i}"] = search_result

        # Step 3: Identify gaps and refine next search
        gaps = identify_knowledge_gaps(search_result, query)
        if gaps:
            gap_result = valyu.search(
                query=gaps[0].refined_query,
                included_sources=gaps[0].target_sources,
                max_price=50.0
            )
            results[f"gap_fill_{i}"] = gap_result

    # Step 4: Cross-validate and synthesize findings
    return synthesize_multi_source_findings(results)

Agentic advantage: Technical domains like research, finance, and medicine benefit most from multi-step search workflows that leverage Valyu’s DeepSearch API.

Human vs. AI Search Optimization

Valyu’s search algorithms are optimised for AI models, not human search patterns. The search algorithms are designed for LLM tool calls and agent workflows. By default, tool_call_mode=true is set to optimise for AI models however you can set it to false for human-facing searches. For AI-driven searches (recommended):
# Optimized for LLM consumption
response = valyu.search(
    "quantum error correction surface codes LDPC performance benchmarks",
    tool_call_mode=True,  # Default: AI-optimized results
)
For human-facing searches:
# Adjusted for human readability
response = valyu.search(
    "quantum computing error correction methods",
    tool_call_mode=False,  # Human-optimized formatting
)

Maximizing Valyu’s Search Parameters

Use optimised prompts for the best results and guardrail your searches using query parameters:
response = valyu.search(
    "GPT-4 vs GPT-3 architectural innovations: training efficiency, inference optimization, and benchmark comparisons",
    search_type="proprietary",
    max_num_results=10,
    relevance_threshold=0.6,
    included_sources=["valyu/valyu-arxiv"],
    max_price=50.0,
    category="machine learning",
    start_date="2024-01-01",
    end_date="2024-12-31"
)
Pro tip: Leverage Valyu’s beyond-the-web capabilities with included_sources like valyu/valyu-arxiv for academic content, financial market data, or specialised datasets that other APIs can’t access.

Quality and Budget Optimization

Scaling Search Quality with Budget

Not getting sufficient results? Increase your max_price parameter to access higher-quality sources.
# Budget progression for increasing quality
search_configs = [
    {"max_price": 20.0, "use_case": "Quick fact-checking"},
    {"max_price": 50.0, "use_case": "Standard research queries"},
    {"max_price": 100.0, "use_case": "Comprehensive analysis"},
]
Budget Impact on Results:
  • $20 CPM: Basic web sources + academic content
  • $50 CPM: Full web coverage + most research databases + financial data
  • $100 CPM: Premium sources + financial data + specialised datasets
Cost optimization: Higher budgets unlock authoritative sources that other APIs can’t access, including exclusive academic journals, financial data streams, and curated research databases.

Context Window Management

Worried about token consumption? DeepSearch provides granular controls for managing LLM context usage. You can set the max_num_results and results_length parameters to control the number of results and the length of the results.
# Optimize for different context requirements
lightweight_search = valyu.search(
    "transformer architecture innovations",
    max_num_results=3,        # Fewer results
    results_length="short",   # Condensed content
    max_price=50.0
)

comprehensive_search = valyu.search(
"transformer architecture innovations",
max_num_results=15, # More coverage
results_length="max", # Full content
max_price=100.0
)

Token Estimation Guide:
  • Short results: Max ~6k tokens per result (25k chars)
  • Medium results: Max ~12k tokens per result (50k chars)
  • Long results: Max ~24k tokens per result (100k chars)
  • Rule of thumb: 4 characters ≈ 1 token
Context strategy: Start with max_num_results=10 and results_length="short" for most use cases, then adjust based on your LLM’s context window and context requirements.

Discovering Specialised Datasets

Access curated, high-quality datasets beyond standard web search Visit: Valyu Platform Datasets Dataset Categories:
  • Academic: ArXiv, PubMed, academic publisher content
  • Financial: SEC filings, earnings reports, market data
  • Medical: Clinical trials, FDA drug labels, medical literature
  • Technical: Patents, specifications, implementation guides
  • Books & Literature: Digitized texts, reference materials
Each dataset provides specific return schemas optimised for different use cases:
# Target specific datasets for specialised searches
academic_search = valyu.search(
    "CRISPR gene editing clinical trials safety outcomes",
    included_sources=["valyu/valyu-pubmed", "valyu/valyu-US-clinical-trials"],
    max_price=30.0
)

financial_search = valyu.search(
"Tesla Q3 2024 earnings revenue breakdown",
included_sources=["valyu/valyu-US-sec-filings", "valyu/valyu-US-earnings"],
max_price=60.0
)

Data advantage: Proprietary datasets behind the DeepSearch API often contain information unavailable through standard web APIs, giving your AI system access to authoritative, structured knowledge that improves factual accuracy.

Avoid Common Integration Mistakes

  1. Token waste: Use max_num_results and results_length parameters to manage LLM context consumption
  2. Missing filters: Always use DeepSearch’s relevance thresholds and source controls for precision
  3. Ignoring cost optimisation: Balance max_price with result quality needs based on your use case
  4. Wrong source expectations: Match dataset selection to your specific domain needs - academic, financial, medical or web sources
  5. Inefficient workflows: Implement agentic search patterns rather than single-shot queries for complex research tasks

Start Building with Valyu

Ready to integrate production-grade search into your AI stack?

Developer Support

Building something ambitious? Our team helps optimize search strategies for mission-critical AI applications: