We are looking for some open source high quality and complex datasets that we can use to train our Large Language Models (LLMs). The domains that we are interested in are:
- Maths
- Phy/Chem/Bio
- English
- Logical Reasoning
- Data Interpretation
- Economics
- Business Studies
We are not just looking for questions but the corresponding answers also.
The only open source that we are aware of as of now is huggingface.co datasets.
Can someone please recommend other similar resources? We would really appreciate it.
We have already done online research and could find only 1 quality source like I mentioned (huggingface.co). Looking for more.