I’m relatively new to SAS, so I might overlook basic features of SAS which I’m not aware of.
I have a large dataset with a numerical variable “A”. I want to write a macro that allows me to create buckets based on an array, but where the array can change in length and values.
Example 1: my array is {1, 2, 6, 10} in that case all observations with values for A
less than 1 should be in bucket 1,
values between 1 and 2 in bucket 2,
values between 2 and 6 in bucket 3 and
values between 6 and 10 in bucket 4
Example 2: my array is {3, 10} in that case all observations with values for A
less than 3 should be in bucket 1 and
values between 3 and 10 in bucket 2
Ideally I would like the array to be an input variable of the macro. Or in case that is not possible, that I just need to change the hard coded array in the macro.
Thank you for your help.
As newby, I haven’t tried anything yet.
Lanox is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2
I suspect that what you have is data with a numeric variable a, that you process in a data step and in that data step, you have an array that defines the bucket boundaries in your example. And I suspect you want some reusable code, that takes a and the array as arguments to create this bucket value.
In this case, what you want is not a macro, but a function.
See if you can use this as a template. I just made up some sample data.
Feel free to ask.
data have;
do _N_ = 1 to 100;
a = rand('uniform') * 10;
output;
end;
format a 8.2;
run;
proc fcmp outlib = work.f.f;
function buck (x, r{*}) varargs;
if x < r[1] then ret = 1;
else do;
do i = 2 to dim(r);
if r[i-1] <= x <= r[i] then do;
ret = i; leave;
end;
end;
end;
return(ret);
endfunc;
run;
options cmplib = work.f;
data want;
set have;
array arr {4} _temporary_ (1 2 6 10);
bucket = buck(a, arr);
run;
Result:
a bucket
0.65 1
4.10 3
1.43 2
7.47 4
6.00 3
5.37 3
3.89 3
9.43 4
1.59 2
1.59 2
Use a custom format to map values into ranges. Bucketing is often done when performing frequency analysis or groupwise aggregate calculations. You can programmatically create a data set that contains the custom format definition.
Example:
data have ;
do row = 1 to 1e4 ;
a = rand('integer', 100);
output;
end;
run;
%let edges = 11, 17, 42, 96 ;
data format_data ;
length start end 8 label $32 hlo $1 ;
retain fmtname 'bucketeer' ;
hlo = 'L' ;
start = . ;
do v = &edges, 1e9 ;
binSeq + 1 ;
end = v ;
if v = 1e9 then hlo='H' ;
label = 'Bucket #' || cats(binSeq) ;
output ;
start = end ;
hlo = ' ' ;
end ;
run ;
proc format cntlin = format_data ;
run ;
ods html file='freq.html';
proc freq data=have;
format a bucketeer.;
table a ;
run;
ods html close ;
One way you could use an “array” to solve this would be to make a temporary array in a data step and initialize the values to your list of values. Then just look through the array until the value being scored is larger than the current item from the list.
Let’s try it on the data in SASHELP.CLASS and split the AGE into 3 buckets like this:
data class;
set sashelp.class;
array list [2] _temporary_ (12 15);
do bucket=1 to dim(list) until(age <= list[bucket]);
* do nothing ;
end;
run;
Let’s see how well it worked:
proc freq data=class;
tables bucket*age / list ;
run;
Results
The FREQ Procedure
Cumulative Cumulative
bucket Age Frequency Percent Frequency Percent
------------------------------------------------------------------
1 11 2 10.53 2 10.53
1 12 5 26.32 7 36.84
2 13 3 15.79 10 52.63
2 14 4 21.05 14 73.68
2 15 4 21.05 18 94.74
3 16 1 5.26 19 100.00
If you want to make if more flexible you could put the list of cutpoints into a macro variable.
%let cutpoints=12 15;
In that case data step stays the same except for the array statement.
array list [%sysfunc(countw(&cutpoints,%str( )))] _temporary_ (&cutpoints);
A more normal SAS method would be to define a FORMAT and then use the format to display the data into buckets.
proc format ;
value age_bucket low-12="1" 12-15="2" 15-high="3";
run;
You could then use that format with the existing variable.
proc freq data=sashelp.class ;
tables age / list;
format age age_bucket.;
run;
Or use it with the PUT() function (and if you want a numeric variable also the INPUT() function) to make a new variable.
data class;
set sashelp.class;
bucket=input(put(age,age_bucket.),32.);
run;
If you actually want a macro then perhaps you just need one that can convert a list of cutpoints into such a format. Like this:
%macro mkbucket(list,fmtname);
%local i value ;
%let value=low;
proc format ;
value &fmtname
%do i=1 %to %sysfunc(countw(&list,%str( )));
&value -
%let value=%scan(&list,&i,%str( ));
&value = "&i"
%end;
&value - high = "&i"
;
run;
%mend mkbucket;
Which you could then use to make the format for later use.
%mkbucket(12 15,age_bucket);