排错:LangChain.js + gemini-embedding-001 向量全为空,导致 pgvector 报错 vector must have at least 1 dimension
现象
我们在使用 Langchain JS 提供的 index 函数( @langchain/core/indexing )做 documents embedding + 写入 pgvector 时,报错:
error: vector must have at least 1 dimension
...
routine: 'vector_in'
具体出错部分代码的示例如下:
import { PostgresRecordManager } from "@langchain/community/indexes/postgres";
import { PGVectorStore } from "@langchain/community/vectorstores/pgvector";
import type { Document } from "@langchain/core/documents";
import type { Embeddings } from "@langchain/core/embeddings";
import { index as langchainIndex } from "@langchain/core/indexing";
import { GoogleGenerativeAIEmbeddings } from "@langchain/google-genai";
const embeddings = new GoogleGenerativeAIEmbeddings({
model: "gemini-embedding-001",
});
async function contentIndex(
documents: Document[],
myMetadata?: Record<string, unknown>
) {
const namespace = myMetadata?.knowledgeId
? `knowledgeId-${myMetadata.knowledgeId as string}`
: documents[0].metadata.source;
const recordManager = new PostgresRecordManager(namespace, {
postgresConnectionOptions: {
connectionString: process.env.DATABASE_URL,
},
});
await recordManager.createSchema();
const embeddings = await getEmbeddings();
const vectorStore = await PGVectorStore.initialize(embeddings, {
postgresConnectionOptions: {
connectionString: env.DATABASE_URL,
},
tableName: "documents",
columns: {
contentColumnName: "content",
},
dimensions: env.EMBEDDING_DIMENSIONS,
});
const result = await langchainIndex({
docsSource: documents,
recordManager,
vectorStore,
options: {
cleanup: "full",
sourceIdKey: "source",
},
});
await vectorStore.end();
await recordManager.end();
return result;
}
错误是从 langchain.js 提供的 index 函数抛出来的,除此之外没有更多详细信息。而且发现无论重试多少次,pgvector 都会报错。
表面上是数据库报错,但很难从栈里定位是哪条 document/chunk 出问题。
排查步骤
先绕开 index 函数,直接使用 embeddings.embedDocuments() 测试:
const vectors = await embeddings.embedDocuments(
documents.map((doc) => doc.pageContent)
);
console.log(vectors);
结果发现了打印的内容如下:
[
[], [], [], [], [], [], [], [],
... // 大量空向量
]
证明在 gemini embedding API 已经出现了不知名的错误,分析了 GoogleGenerativeAIEmbeddings 的源码,发现它处理批量 embedding 的代码如下:
const batchEmbedRequests = batchEmbedChunks.map((chunk) => ({
requests: chunk.map((doc) => this._convertToContent(doc)),
}));
const responses = await Promise.allSettled(
batchEmbedRequests.map((req) => this.client.batchEmbedContents(req))
);
const embeddings = responses.flatMap((res, idx) => {
if (res.status === "fulfilled") {
return res.value.embeddings.map((e) => e.values || []);
} else {
return Array(batchEmbedChunks[idx].length).fill([]);
}
});
return embeddings;
这段代码使用了 Promise.allSettled 并行请求 API,但是底层却没有处理异常错误,导致异常被吃掉了,而且后面还使用了 fill 填充空向量,导致后续写入数据库时出错。
为了了解底层的错误是什么,我们添加了一个 “猴子补丁” 来打印异常信息:
function attachGenAIDebug(embeddings: any) {
const client = embeddings?.client;
if (!client) return;
for (const method of ["batchEmbedContents", "embedContent"] as const) {
if (typeof client[method] !== "function") continue;
const orig = client[method].bind(client);
client[method] = async (req: any) => {
const meta =
method === "batchEmbedContents"
? {
requests: req?.requests?.length,
totalChars: (req?.requests ?? []).reduce(
(s: number, r: any) =>
s + (r?.content?.parts?.[0]?.text?.length ?? 0),
0
),
}
: { chars: req?.content?.parts?.[0]?.text?.length ?? 0 };
try {
const res = await orig(req);
const dims =
res?.embedding?.values?.length ??
res?.embeddings?.[0]?.values?.length ??
0;
console.error(`[genai] ${method} OK`, meta, { dims });
return res;
} catch (err) {
console.error(`[genai] ${method} FAIL`, meta, err);
throw err;
}
};
}
}
然后使用这个补丁拦截错误:
const embeddings = new GoogleGenerativeAIEmbeddings();
attachGenAIDebug(embeddings);
const vectors = await embeddings.embedDocuments(
documents.map((doc) => doc.pageContent)
);
// ...
这次终于看到了掩藏在现象背后的真实错误:
[genai] batchEmbedContents FAIL { requests: 100, totalChars: 49929 } GoogleGenerativeAIFetchError: [GoogleGenerativeAI Error]: Error fetching from https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:batchEmbedContents: [429 Too Many Requests] You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit.
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 100, model: gemini-embedding-1.0
Please retry in 20.088593563s. [{"@type":"type.googleapis.com/google.rpc.Help","links":[{"description":"Learn more about Gemini API quotas","url":"https://ai.google.dev/gemini-api/docs/rate-limits"}]},{"@type":"type.googleapis.com/google.rpc.QuotaFailure","violations":[{"quotaMetric":"generativelanguage.googleapis.com/embed_content_free_tier_requests","quotaId":"EmbedContentRequestsPerMinutePerUserPerProjectPerModel-FreeTier","quotaDimensions":{"location":"global","model":"gemini-embedding-1.0"},"quotaValue":"100"}]},{"@type":"type.googleapis.com/google.rpc.RetryInfo","retryDelay":"20s"}]
at handleResponseNotOk
{
status: 429,
statusText: 'Too Many Requests',
errorDetails: [
{ '@type': 'type.googleapis.com/google.rpc.Help', links: [Array] },
{
'@type': 'type.googleapis.com/google.rpc.QuotaFailure',
violations: [Array]
},
{
'@type': 'type.googleapis.com/google.rpc.RetryInfo',
retryDelay: '20s'
}
]
}
[genai] batchEmbedContents OK { requests: 42, totalChars: 25856 } { dims: 3072 }
[
[], [], [], [], [], [], [], [], [], [], [], [],
[], [], [], [], [], [], [], [], [], [], [], [],
[], [], [], [], [], [], [], [], [], [], [], [],
[], [], [], [], [], [], [], [], [], [], [], [],
[], [], [], [], [], [], [], [], [], [], [], [],
[], [], [], [], [], [], [], [], [], [], [], [],
[], [], [], [], [], [], [], [], [], [], [], [],
[], [], [], [], [], [], [], [], [], [], [], [],
[], [], [], [],
... 42 more items
]
这下真相大白,原来 gemini embedding API 的调用频率限制是 100/min,所以如果超过 100/min 的调用,就会触发 429 rate limit 错误。
解决方案
知道了问题就可以针对性的解决了。首先就是 maxRetries 这个参数完全不会起作用,因为我们已经发现了它底层已经把所有的异常吃掉了,根本不会上抛,也就更本不会触发 AsyncCaller 的重试机制。
因此唯一能做的就是想办法减少每次 embedding 的批次大小。
GoogleGenerativeAIEmbeddings 源码中已经定义了一个 maxBatchSize 参数,固定 100,这个参数是写死的,没有提供修改的渠道,那么唯一能控制的就是调用 embedDocuments 的批次大小了。
如果直接调用 embeddings.embedDocuments(documents),那么我们可以手动先处理 documents 的大小,然后分批次调用 embedDocuments。
对于 index 函数,我发现它提供了一个 batchSize 参数,内部会帮我们控制 documents 的批次大小:
import { index as langchainIndex } from "@langchain/core/indexing";
const result = await langchainIndex({
docsSource: documents,
recordManager,
vectorStore,
options: {
cleanup: "full",
sourceIdKey: "source",
batchSize: 10,
},
});
虽然调用太快了依旧会导致 error: vector must have at least 1 dimension 错误,但是提供了自行重试修复的可能性,经过不断外部自行重试,所有的 documents 总会有完成 embedding 的时候,而不是永远卡住无法完成。
总结
LangChainJS 提供的 Gemini Embedding API 封装,实现比较简单,对于流控处理也不是特别完善。因此只能自行控制批次大小,设置合理的重试规则,才能让 Gemini Embedding 功能正常使用。