I’m using LangChain’s load_summarize_chain
function combined with map_reduce
and gpt-3.5-turbo-instruct
to summarize multiple documents within a specific folder. Each folder has a unique ID and contains documents related to a single topic. For example, a folder with the ID ‘xyz123’ contains only documents related to cooking, and I expect the summary output to solely reflect cooking-related content.
However, the summaries I’m getting include content from previously summarized documents from different topics, which should not happen. Here’s an example of what’s happening:
Actual Output:
This document discusses a cell with two end caps, part of a battery pack with efficient thermal management and improved current conductivity. It also includes a method for manufacturing the cell and a new cell design. The document covers various technologies and methods related to independent devices, computing apparatus, and non-transitory computer readable media. It also describes a system for classifying queries based on objects and actions, and assessing query performance, which can be implemented in different networks and can function as a cloud. The document also mentions a shared storage area network, IoT sensors, and a network module for communication between computers.
As seen above, the output correctly begins with a discussion about battery systems and later transitions into database-related topics.
Any suggestions on how to ensure the load_summarize_chain
function only summarizes the documents in the specified folder without mixing content?
(P.S: If the formatting of asking the question is incorrect, do let me know I would fix it. I am close to getting banned because of unnecessary downvotes and reports.)