LangChain / LlamaIndex Integrationยถ
Build AI applications with LangChain and LlamaIndex using Amazon Bedrock models through stdapi.ai. Keep your existing codeโjust point it to stdapi.ai and access Claude, Amazon Nova, and all Bedrock models with zero code changes.
About LangChain & LlamaIndexยถ
๐ LangChain: Website | Python Docs | JS/TS Docs | GitHub | Discord
๐ LlamaIndex: Website | Documentation | GitHub | Discord
Popular frameworks for building AI applications:
- LangChain - 90,000+ GitHub stars, comprehensive framework for chains, agents, and RAG
- LlamaIndex - 35,000+ GitHub stars, specialized for data indexing and retrieval
- Both support Python, JavaScript/TypeScript, and multiple LLM providers
- Production-ready - Used by companies building AI products
Why LangChain/LlamaIndex + stdapi.ai?ยถ
Drop-In Compatibility
Both frameworks are designed to work with OpenAI's API, making stdapi.ai a drop-in replacement for Amazon Bedrock models without changing your application code.
Key Benefits:
- Zero code changes - Change one parameter (base_url) to switch providers
- Keep your codebase - All existing LangChain/LlamaIndex code works as-is
- Access Claude, Amazon Nova, and all Bedrock models
- Use chains, agents, retrievers, and all framework capabilities
- AWS pricing instead of OpenAI rates
- Your data stays in your AWS environment
Work in Progress
This integration guide is actively being developed and refined. While the configuration examples are based on documented APIs and best practices, they are pending practical validation. Complete end-to-end deployment examples will be added once testing is finalized.
Prerequisitesยถ
What You'll Need
Before you begin, make sure you have:
- โ Python 3.8+ or Node.js 18+ (depending on your language)
- โ LangChain or LlamaIndex installed
- โ Your stdapi.ai server URL (e.g.,
https://api.example.com) - โ An API key (if authentication is enabled)
- โ AWS Bedrock access configured with desired models
๐ LangChain Python Integrationยถ
Installationยถ
Install LangChain
# Install LangChain with OpenAI support
pip install langchain langchain-openai
# For RAG applications, also install:
pip install langchain-community chromadb
Basic Chat Model Setupยถ
Replace OpenAI with stdapi.ai in your LangChain applications.
Python - Basic Chat
from langchain_openai import ChatOpenAI
# Configure chat model to use stdapi.ai
llm = ChatOpenAI(
model="anthropic.claude-sonnet-4-5-20250929-v1:0",
openai_api_key="your_stdapi_key_here",
openai_api_base="https://YOUR_SERVER_URL/v1",
temperature=0.7
)
# Use it like any LangChain model
response = llm.invoke("Explain quantum computing in simple terms")
print(response.content)
Available Models:
Use any Amazon Bedrock chat model by specifying its model ID. Popular choices include Claude Sonnet, Claude Haiku, Amazon Nova Pro, Amazon Nova Lite, Amazon Nova Micro, and any other Bedrock models available in your region.
Embeddings for RAGยถ
Use Amazon Bedrock embeddings for semantic search and RAG applications.
Python - Embeddings
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Configure embeddings to use stdapi.ai
embeddings = OpenAIEmbeddings(
model="amazon.titan-embed-text-v2:0",
openai_api_key="your_stdapi_key_here",
openai_api_base="https://YOUR_SERVER_URL/v1"
)
# Example: Create vector store from documents
documents = ["Your document text here...", "Another document..."]
# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
chunks = text_splitter.create_documents(documents)
# Create vector store
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings
)
# Query the vector store
results = vectorstore.similarity_search("your query here", k=3)
Available Embedding Models:
- Amazon Titan Embed Text v2 โ
amazon.titan-embed-text-v2:0(recommended, 8192 dimensions) - Amazon Titan Embed Text v1 โ
amazon.titan-embed-text-v1(legacy, 1536 dimensions) - Cohere Embed โ If enabled in your AWS region
Complete RAG Pipelineยถ
Build a full RAG application with stdapi.ai.
Python - RAG Application
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
# 1. Configure LLM
llm = ChatOpenAI(
model="anthropic.claude-sonnet-4-5-20250929-v1:0",
openai_api_key="your_stdapi_key_here",
openai_api_base="https://YOUR_SERVER_URL/v1",
temperature=0
)
# 2. Configure embeddings
embeddings = OpenAIEmbeddings(
model="amazon.titan-embed-text-v2:0",
openai_api_key="your_stdapi_key_here",
openai_api_base="https://YOUR_SERVER_URL/v1"
)
# 3. Load and process documents
loader = TextLoader("path/to/your/documents.txt")
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
chunks = text_splitter.split_documents(documents)
# 4. Create vector store
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)
# 5. Create retrieval chain
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
return_source_documents=True
)
# 6. Query your documents
query = "What are the key points in the document?"
result = qa_chain.invoke({"query": query})
print("Answer:", result["result"])
print("\nSources:")
for doc in result["source_documents"]:
print(f"- {doc.page_content[:100]}...")
Agents and Toolsยถ
Build LangChain agents powered by Bedrock models.
Python - Agent with Tools
from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_functions_agent, AgentExecutor
from langchain.tools import Tool
from langchain import hub
# Configure LLM
llm = ChatOpenAI(
model="anthropic.claude-sonnet-4-5-20250929-v1:0",
openai_api_key="your_stdapi_key_here",
openai_api_base="https://YOUR_SERVER_URL/v1",
temperature=0
)
# Define custom tools
def search_documentation(query: str) -> str:
"""Search internal documentation."""
# Your search logic here
return f"Search results for: {query}"
def calculate(expression: str) -> str:
"""Perform calculations."""
try:
return str(eval(expression))
except:
return "Invalid expression"
tools = [
Tool(
name="SearchDocs",
func=search_documentation,
description="Search internal documentation for information"
),
Tool(
name="Calculator",
func=calculate,
description="Perform mathematical calculations"
)
]
# Get the prompt template
prompt = hub.pull("hwchase17/openai-functions-agent")
# Create agent
agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# Run the agent
result = agent_executor.invoke({
"input": "Search our docs for API authentication and calculate 15 * 23"
})
print(result["output"])
๐ฆ LlamaIndex Python Integrationยถ
Installationยถ
Install LlamaIndex
# Install LlamaIndex with OpenAI support
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai
Basic Setupยถ
Configure LlamaIndex to use stdapi.ai.
Python - LlamaIndex Setup
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
# Configure LLM
llm = OpenAI(
model="anthropic.claude-sonnet-4-5-20250929-v1:0",
api_key="your_stdapi_key_here",
api_base="https://YOUR_SERVER_URL/v1",
temperature=0.7
)
# Configure embeddings
embed_model = OpenAIEmbedding(
model="amazon.titan-embed-text-v2:0",
api_key="your_stdapi_key_here",
api_base="https://YOUR_SERVER_URL/v1"
)
# Set global defaults
Settings.llm = llm
Settings.embed_model = embed_model
# Use the LLM
response = llm.complete("Explain the benefits of vector databases")
print(response)
Document Indexing and Queryยถ
Build a searchable knowledge base with LlamaIndex.
Python - LlamaIndex RAG
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
Settings
)
# Configure models
Settings.llm = OpenAI(
model="anthropic.claude-sonnet-4-5-20250929-v1:0",
api_key="your_stdapi_key_here",
api_base="https://YOUR_SERVER_URL/v1"
)
Settings.embed_model = OpenAIEmbedding(
model="amazon.titan-embed-text-v2:0",
api_key="your_stdapi_key_here",
api_base="https://YOUR_SERVER_URL/v1"
)
# Load documents
documents = SimpleDirectoryReader("./data").load_data()
# Create index
index = VectorStoreIndex.from_documents(documents)
# Query the index
query_engine = index.as_query_engine(similarity_top_k=3)
response = query_engine.query("What are the main topics in these documents?")
print(response)
๐จ LangChain JavaScript/TypeScript Integrationยถ
Installationยถ
Install LangChain.js
# Using npm
npm install langchain @langchain/openai
# Using yarn
yarn add langchain @langchain/openai
Basic Chat Model Setupยถ
TypeScript - Basic Chat
import { ChatOpenAI } from "@langchain/openai";
// Configure chat model to use stdapi.ai
const llm = new ChatOpenAI({
modelName: "anthropic.claude-sonnet-4-5-20250929-v1:0",
openAIApiKey: "your_stdapi_key_here",
configuration: {
baseURL: "https://YOUR_SERVER_URL/v1",
},
temperature: 0.7,
});
// Use it
const response = await llm.invoke("Explain machine learning in simple terms");
console.log(response.content);
RAG with LangChain.jsยถ
TypeScript - RAG Application
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { RetrievalQAChain } from "langchain/chains";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { Document } from "langchain/document";
// Configure models
const llm = new ChatOpenAI({
modelName: "anthropic.claude-sonnet-4-5-20250929-v1:0",
openAIApiKey: "your_stdapi_key_here",
configuration: {
baseURL: "https://YOUR_SERVER_URL/v1",
},
temperature: 0,
});
const embeddings = new OpenAIEmbeddings({
modelName: "amazon.titan-embed-text-v2:0",
openAIApiKey: "your_stdapi_key_here",
configuration: {
baseURL: "https://YOUR_SERVER_URL/v1",
},
});
// Load and split documents
const documents = [
new Document({ pageContent: "Your document text here..." }),
new Document({ pageContent: "Another document..." }),
];
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const chunks = await textSplitter.splitDocuments(documents);
// Create vector store
const vectorStore = await MemoryVectorStore.fromDocuments(
chunks,
embeddings
);
// Create retrieval chain
const chain = RetrievalQAChain.fromLLM(
llm,
vectorStore.asRetriever(3)
);
// Query
const result = await chain.call({
query: "What are the key points in the document?",
});
console.log(result.text);
๐ก Common Patterns and Best Practicesยถ
Environment Variablesยถ
Store credentials securely using environment variables.
Python - Environment Variables
import os
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="anthropic.claude-sonnet-4-5-20250929-v1:0",
openai_api_key=os.getenv("STDAPI_KEY"),
openai_api_base=os.getenv("STDAPI_BASE_URL"),
temperature=0.7
)
TypeScript - Environment Variables
import { ChatOpenAI } from "@langchain/openai";
const llm = new ChatOpenAI({
modelName: "anthropic.claude-sonnet-4-5-20250929-v1:0",
openAIApiKey: process.env.STDAPI_KEY,
configuration: {
baseURL: process.env.STDAPI_BASE_URL,
},
temperature: 0.7,
});
Streaming Responsesยถ
Enable streaming for better user experience in interactive applications.
Python - Streaming
from langchain_openai import ChatOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
llm = ChatOpenAI(
model="anthropic.claude-sonnet-4-5-20250929-v1:0",
openai_api_key="your_stdapi_key_here",
openai_api_base="https://YOUR_SERVER_URL/v1",
streaming=True,
callbacks=[StreamingStdOutCallbackHandler()]
)
# Response will stream to stdout as it's generated
llm.invoke("Write a long story about artificial intelligence")
Retry Logic and Error Handlingยถ
Implement robust error handling for production applications.
Python - Error Handling
from langchain_openai import ChatOpenAI
from langchain.globals import set_llm_cache
from langchain.cache import InMemoryCache
import time
# Enable caching to reduce redundant calls
set_llm_cache(InMemoryCache())
llm = ChatOpenAI(
model="anthropic.claude-sonnet-4-5-20250929-v1:0",
openai_api_key="your_stdapi_key_here",
openai_api_base="https://YOUR_SERVER_URL/v1",
max_retries=3,
request_timeout=60
)
def query_with_retry(prompt: str, max_attempts: int = 3):
for attempt in range(max_attempts):
try:
response = llm.invoke(prompt)
return response.content
except Exception as e:
if attempt < max_attempts - 1:
wait_time = 2 ** attempt # Exponential backoff
print(f"Attempt {attempt + 1} failed: {e}. Retrying in {wait_time}s...")
time.sleep(wait_time)
else:
raise
result = query_with_retry("Your prompt here")
๐ Model Recommendationsยถ
Choose the right model for your application needs. These are examplesโall Bedrock models are available.
| Use Case | Example Model | Why |
|---|---|---|
| Complex RAG | anthropic.claude-sonnet-4-5-20250929-v1:0 |
Superior reasoning and context understanding |
| Fast Queries | amazon.nova-micro-v1:0 |
Quick responses, cost-effective |
| Long Documents | amazon.nova-pro-v1:0 |
Large context window (300K tokens) |
| Balanced | amazon.nova-lite-v1:0 |
Good performance at reasonable cost |
| Code Generation | anthropic.claude-sonnet-4-5-20250929-v1:0 |
Excellent coding capabilities |
| Embeddings | amazon.titan-embed-text-v2:0 |
High-dimensional (8192), optimized for RAG |
๐ง Migration from OpenAIยถ
Migrating existing LangChain/LlamaIndex applications is straightforward.
Migration Steps
1. Update Configuration:
- Change openai_api_base to your stdapi.ai URL
- Change openai_api_key to your stdapi.ai key
- Change model names to Bedrock model IDs
2. Test Thoroughly: - Start with non-critical workflows - Compare outputs between OpenAI and Bedrock models - Monitor performance and costs
3. Gradual Rollout: - Deploy to staging/dev environments first - Run A/B tests if possible - Gather team feedback
4. Monitor and Optimize: - Track token usage and costs - Adjust model selection based on performance - Fine-tune prompts for Bedrock models
๐ Next Steps & Resourcesยถ
Getting Startedยถ
- Try Examples: Run the code examples above with your stdapi.ai credentials
- Build a Prototype: Start with a simple RAG application
- Experiment with Models: Test different Bedrock models for your use case
- Scale Up: Move to production with proper error handling and monitoring
- Optimize: Fine-tune model selection and parameters for cost/performance
Learn Moreยถ
Additional Resources
- LangChain Documentation โ Official LangChain guides
- LlamaIndex Documentation โ Official LlamaIndex guides
- API Overview โ Complete list of available Bedrock models
- Chat Completions API โ Detailed API documentation
- Configuration Guide โ Advanced stdapi.ai configuration
Community & Supportยถ
Need Help?
- ๐ฌ Join the LangChain Discord for framework questions
- ๐ฌ Join the LlamaIndex Discord for framework questions
- ๐ Review Amazon Bedrock documentation for model-specific details
- ๐ Report issues on the GitHub repository
โ ๏ธ Important Considerationsยถ
Model Availability
Regional Differences: Not all Amazon Bedrock models are available in every AWS region. Verify model availability before deploying production applications.
Check availability: See the API Overview for supported models by region.
API Compatibility
OpenAI Compatibility: stdapi.ai implements the OpenAI API specification. Most features work seamlessly, but some advanced OpenAI-specific features may have differences.
Function Calling: Function calling support varies by modelโtest thoroughly if your application relies on it.
Cost Optimization
- Cache Frequently Used Results: Use LangChain's caching to reduce API calls
- Right-Size Context: Only include necessary context in prompts to reduce token usage
- Choose Efficient Models: Use Nova Micro/Lite for simple tasks, premium models for complex reasoning
- Batch Processing: Process multiple items in batches when possible
- Monitor Token Usage: Track consumption and set up alerts for unexpected spikes