I want to generate realistic test data. It should support customizable Fields, structure of data by specifying field names and types. It should support a wide range of data types, including names, addresses, email addresses, electrical products, household products etc.
I plan to use small language model, because i don’t want full sentence, just word(s).Plan to convert language model response into json. I want to implement similar to https://www.mockaroo.com/ by using AI
Can you refer some small language model to achieve it.
You should be able to complete the task using any small language model, given that you provide a clear prompt. Also, you might need to process the output of SLMs (Small Language Models) based on your requirements. Some of the common examples are Gemma 2 9B, Gemma 7B, Llama etc. You can either use their APIs or download the whole model weights and run it locally.
Here is an example of how you would use gemma (example from HF). You must do some string processing to extract the data in the required format.
Prompt: give 10 rows of data with name, IP address, company, and country.
Response:
Step 1/10
1. John Smith, 123.45.67.89, Google, United States
Step 2/10
2. Jane Doe, 98.76.54.32, Facebook, United States
Step 3/10
3. Mark Zuckerberg, 1.1.1.1, Meta, United States
Step 4/10
4. Bill Gates, 2.2.2.2, Microsoft, United States
Step 5/10
5. Steve Jobs, 3.3.3.3, Apple, United States