I am trying to dedup the below data using an LLM prompt
Product Line:DTVs, Sales Channel:INDIRECT , gross adds =51
Product Line:BYOD, Sales Channel:ONLINE2, gross adds =100
Product Line:BYOD, Sales Channel:ONLINE1 , gross adds =200
Product Line:BYOD, Sales Channel:ONLINE3, gross adds =400
Product Line:BYOD, Sales Channel:ONLINE4, gross adds =500
Product Line:BYOD, Sales Channel:null, gross adds =300
dedup criteria:
1.identify all the Sales Channel=null
records and
see if they are duplicate
2.they are duplicate if the same product
line and sum of gross adds of any other sales channels equals the
null records gross adds.
========================================================================
example:
for Sales Channel:null, gross adds =300
,
Sales Channel:null, gross adds =
Product Line:BYOD, Sales Channel:ONLINE2
+
Product Line:BYOD, Sales Channel:ONLINE1
, where 300==300
So I need to identify Sales Channel:null, gross adds =300
as a duplicate
how to effectively create LLM prompts for such problems ?
I also need to identify trends from similar data sets, whats the effective approach ?
I am planning on using Mistral/DBRX models.