👋
I’m currently developing a Slack bot using Retrieval-Augmented Generation (RAG) to answer HR and company-related queries. Here’s the tech stack I’m using:
- LLM: AWS Bedrock (Mixtal 8*7b)
- Embeddings: OpenAI (text-embedding-3-small)
- Vector Store: Zilliz (serverless?) or Qdrant
- Documents Storage: AWS S3
The bot will serve multiple users in our Slack organization, allowing them to interact with it simultaneously. Additionally, it needs to store conversation history for each user, which will be used by the LLM to provide contextually relevant responses. However, I’m trying to decide between AWS Lambda, EC2, or ECS for hosting the backend, and I’m unsure which option best fits my requirements.
I’d love to hear your experiences or recommendations for similar scenarios. What factors should I consider most, and are there best practices for these services? How do you handle storing conversation history in a scalable manner, especially when it’s used by the LLM?
Thanks for your insights! 😊
Current Thoughts:
I’m inclined towards AWS Lambda for its ease and cost-effectiveness but am wary of its limitations. EC2 provides control for tuning performance, while ECS offers container benefits.
Rafay Khattak is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.