I have a truly massive dataset and I want to use Vowpal Wabbit to fit several regression models. Here’s the hitch: I need to run these models on different subsets of the data. To make this concrete, suppose my dataset data.vw
looks like this
3.2 |varw w:1 |other x:0.5
1.1 |varw w:0 |other x:3.1
0.3 |varw w:1 |other x:1.0
5.5 |varw w:0 |other x:3.0
I want to run a regression model using only examples were W=1 without having to keep different version of the data saved on disk or having to do expensive data munging on my huge dataset.