Skip to main content

PGVector

To enable vector search in a generic PostgreSQL database, LangChain.js supports using the pgvector Postgres extension.

Setup

To work with PGVector, you need to install the pg package:

npm install pg

Setup a pgvector self hosted instance with docker-compose

npm install @langchain/openai @langchain/community

pgvector provides a prebuilt Docker image that can be used to quickly setup a self-hosted Postgres instance. Create a file below named docker-compose.yml:

# Run this command to start the database:
# docker-compose up --build
version: "3"
services:
db:
hostname: 127.0.0.1
image: ankane/pgvector
ports:
- 5432:5432
restart: always
environment:
- POSTGRES_DB=api
- POSTGRES_USER=myuser
- POSTGRES_PASSWORD=ChangeMe
volumes:
- ./init.sql:/docker-entrypoint-initdb.d/init.sql

And then in the same directory, run docker compose up to start the container.

You can find more information on how to setup pgvector in the official repository.

Usage

Security

User-generated data such as usernames should not be used as input for table and column names.
This may lead to SQL Injection!

One complete example of using PGVectorStore is the following:

import { OpenAIEmbeddings } from "@langchain/openai";
import {
DistanceStrategy,
PGVectorStore,
} from "@langchain/community/vectorstores/pgvector";
import { PoolConfig } from "pg";

// First, follow set-up instructions at
// https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/pgvector

const config = {
postgresConnectionOptions: {
type: "postgres",
host: "127.0.0.1",
port: 5433,
user: "myuser",
password: "ChangeMe",
database: "api",
} as PoolConfig,
tableName: "testlangchain",
columns: {
idColumnName: "id",
vectorColumnName: "vector",
contentColumnName: "content",
metadataColumnName: "metadata",
},
// supported distance strategies: cosine (default), innerProduct, or euclidean
distanceStrategy: "cosine" as DistanceStrategy,
};

const pgvectorStore = await PGVectorStore.initialize(
new OpenAIEmbeddings(),
config
);

await pgvectorStore.addDocuments([
{ pageContent: "what's this", metadata: { a: 2, b: ["tag1", "tag2"] } },
{ pageContent: "Cat drinks milk", metadata: { a: 1, b: ["tag2"] } },
]);

const results = await pgvectorStore.similaritySearch("water", 1);

console.log(results);

/*
[ Document { pageContent: 'Cat drinks milk', metadata: { a: 1 } } ]
*/

// Filtering is supported
const results2 = await pgvectorStore.similaritySearch("water", 1, {
a: 2,
});

console.log(results2);

/*
[ Document { pageContent: 'what's this', metadata: { a: 2 } } ]
*/

// Filtering on multiple values using "in" is supported too
const results3 = await pgvectorStore.similaritySearch("water", 1, {
a: {
in: [2],
},
});

console.log(results3);

/*
[ Document { pageContent: 'what's this', metadata: { a: 2 } } ]
*/

await pgvectorStore.delete({
filter: {
a: 1,
},
});

const results4 = await pgvectorStore.similaritySearch("water", 1);

console.log(results4);

/*
[ Document { pageContent: 'what's this', metadata: { a: 2 } } ]
*/

// Filtering using arrayContains (?|) is supported
const results5 = await pgvectorStore.similaritySearch("water", 1, {
b: {
arrayContains: ["tag1"],
},
});

console.log(results5);

/*
[ Document { pageContent: "what's this", metadata: { a: 2, b: ['tag1', 'tag2'] } } } ]
*/

await pgvectorStore.end();

API Reference:

You can also specify a collectionTableName and a collectionName to partition vectors between multiple users or namespaces.

Advanced: reusing connections

You can reuse connections by creating a pool, then creating new PGVectorStore instances directly via the constructor.

Note that you should call .initialize() to set up your database at least once to set up your tables properly before using the constructor.

import { OpenAIEmbeddings } from "@langchain/openai";
import { PGVectorStore } from "@langchain/community/vectorstores/pgvector";
import pg from "pg";

// First, follow set-up instructions at
// https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/pgvector

const reusablePool = new pg.Pool({
host: "127.0.0.1",
port: 5433,
user: "myuser",
password: "ChangeMe",
database: "api",
});

const originalConfig = {
pool: reusablePool,
tableName: "testlangchain",
collectionName: "sample",
collectionTableName: "collections",
columns: {
idColumnName: "id",
vectorColumnName: "vector",
contentColumnName: "content",
metadataColumnName: "metadata",
},
};

// Set up the DB.
// Can skip this step if you've already initialized the DB.
// await PGVectorStore.initialize(new OpenAIEmbeddings(), originalConfig);

const pgvectorStore = new PGVectorStore(new OpenAIEmbeddings(), originalConfig);

await pgvectorStore.addDocuments([
{ pageContent: "what's this", metadata: { a: 2 } },
{ pageContent: "Cat drinks milk", metadata: { a: 1 } },
]);

const results = await pgvectorStore.similaritySearch("water", 1);

console.log(results);

/*
[ Document { pageContent: 'Cat drinks milk', metadata: { a: 1 } } ]
*/

const pgvectorStore2 = new PGVectorStore(new OpenAIEmbeddings(), {
pool: reusablePool,
tableName: "testlangchain",
collectionTableName: "collections",
collectionName: "some_other_collection",
columns: {
idColumnName: "id",
vectorColumnName: "vector",
contentColumnName: "content",
metadataColumnName: "metadata",
},
});

const results2 = await pgvectorStore2.similaritySearch("water", 1);

console.log(results2);

/*
[]
*/

await reusablePool.end();

API Reference:

Create HNSW Index

By default, the extension performs a sequential scan search, with 100% recall. You might consider creating an HNSW index for approximate nearest neighbor (ANN) search to speed up similaritySearchVectorWithScore execution time. To create the HNSW index on your vector column, use the createHnswIndex() method:

The optional method parameters include:

dims: Defines the number of dimensions in your vector data, max: 2000. For example, use 1536 for OpenAI's text-embedding-ada-002 model and 1024 for amazon.titan-embed-text-v2:0

m: The max number of connections per layer (16 by default)

efConstruction: The size of the dynamic candidate list for constructing the graph (64 by default)

distanceFunction: The distance function name you want to use, is automatically selected based on the distanceStrategy.

More info at the pgvector github project

import { OpenAIEmbeddings } from "@langchain/openai";
import {
DistanceStrategy,
PGVectorStore,
} from "@langchain/community/vectorstores/pgvector";
import { PoolConfig } from "pg";

// First, follow set-up instructions at
// https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/pgvector

const config = {
postgresConnectionOptions: {
type: "postgres",
host: "127.0.0.1",
port: 5433,
user: "myuser",
password: "ChangeMe",
database: "api",
} as PoolConfig,
tableName: "testlangchain",
columns: {
idColumnName: "id",
vectorColumnName: "vector",
contentColumnName: "content",
metadataColumnName: "metadata",
},
// supported distance strategies: cosine (default), innerProduct, or euclidean
distanceStrategy: "cosine" as DistanceStrategy,
};

const pgvectorStore = await PGVectorStore.initialize(
new OpenAIEmbeddings(),
config
);

// create the index
await pgvectorStore.createHnswIndex({
dims: 1536,
efConstruction: 64,
m: 16,
});

await pgvectorStore.addDocuments([
{ pageContent: "what's this", metadata: { a: 2, b: ["tag1", "tag2"] } },
{ pageContent: "Cat drinks milk", metadata: { a: 1, b: ["tag2"] } },
]);

const model = new OpenAIEmbeddings();
const query = await model.embedQuery("water");
const results = await pgvectorStore.similaritySearchVectorWithScore(query, 1);

console.log(results);

await pgvectorStore.end();

API Reference:


Was this page helpful?


You can leave detailed feedback on GitHub.