From inside of awk, I want to generate a string of X alphanumeric characters reasonably random (i.e., random but not cryptographic) on demand and rapidly.
In Ruby, I could do this:
ruby -e '
def rand_string(len, min=48, max=123, pattern=/[[:alnum:]]/)
rtr=""
while rtr.length<len do
rtr+=(0..len).map { (min + rand(max-min)).chr }.
select{|e| e[pattern] }.join
end # falls out when min length achieved
rtr[0..len]
end
(0..5).each{|_| puts rand_string(20)}'
Prints:
7Ntz5NF5juUL7tGmYQhsc
kaOzO1aIxkW5rmJ9CaKtD
49SpdFTibXR1WPWV7li6c
PT862YZQd0dOIaFOIY0d1
vYktRXkdsj38iH3s2WKI
3nQZ7cCVEXvoaOZvm6mTR
For a time comparison, the Ruby can produce 1,000,000 unique strings (no duplicates) in roughly 9 seconds.
Taking that, I tried in awk:
awk -v r=$RANDOM '
# the r value will only be a new seed each invocation -- not each f call
function rand_string(i) {
s=""
min=48
max=123
srand(r)
while (length(s)<i) {
c=sprintf("%c", int(min+rand()*(max-min+1)))
if (c~/[[:alnum:]]/) s=s c
}
return s
}
BEGIN{ for (i=1; i<=5; i++) {print rand_string(20)}}'
That does not work — same seed, same string result. Prints:
D65CsI55zTsk5otzSoJI
D65CsI55zTsk5otzSoJI
D65CsI55zTsk5otzSoJI
D65CsI55zTsk5otzSoJI
D65CsI55zTsk5otzSoJI
Now try reading /dev/urandom
with od
:
awk '
function rand_string(i) {
arg=i*4
cmd="od -A n -t u1 -N " arg " /dev/urandom" # this is POSIX
# ^ ^ unsigned character
# ^ ^ count of i*4 bytes
s=""
min=48
max=123
while (length(s)<i) {
while((cmd | getline line)>0) {
split(line, la)
for (e in la) {
if (la[e]<min || la[e]>max) continue
c=sprintf("%c", la[e])
if (c~/[[:alnum:]]/) s=s c
}
}
close(cmd)
}
return substr(s,1,i)
}
BEGIN {for(i=1;i<=5;i++) print rand_string(20) }'
This works as desired. Prints:
sYY195x6fFQdYMrOn1OS
9mv7KwtgdUu2DgslQByo
LyVvVauEBZU2Ad6kVY9q
WFsJXvw8YWYmySIP87Nz
AMcZY2hKNzBhN1ByX7LW
But now the problem is with the pipe od -A n -t u1 -N " arg " /dev/urandom
is is really slow — unusable except for a trivial number of strings.
Any idea how I can modify one of those awks so that it:
- Runs on most platforms (i.e., default POSIX kit);
- Can produce reasonably random strings of X length rapidly.
This question has been asked a few times:
- How can I replace a string with a random alphanumeric string 48 characters long using awk where the answer is use external tools — too slow;
- Substitute given pattern with a random one with awk but that is a random int and does not use
srand
; - Execute a command (to generate random strings) inside awk but again uses shell pipe (too slow) and Linux only.
Take awk 1 and do this:
time awk -v r=$RANDOM '
function rand_string(i) {
s=""
min=48
max=123
#srand(res) Duh!! WRONG! Only use srand once or it resets to the same sequence
while (length(s)<i) {
c=sprintf("%c", int(min+rand()*(max-min+1)))
if (c~/[[:alnum:]]/) s=s c
}
return s
}
BEGIN{
srand(r) # Use srand ONCE only
for (i=1; i<=1000000; i++) {print rand_string(20)}
}' | uniq -c | awk '$1>1'
# No output so no duplicates
real 0m9.813s
user 0m10.413s
sys 0m0.074s
VS the Ruby:
time ruby -e '
def rand_string(len, min=48, max=123, pattern=/[[:alnum:]]/)
rtr=""
while rtr.length<len do
rtr+=(0..len).map { (min + rand(max-min)).chr }.
select{|e| e[pattern] }.join
end # falls out when min length achieved
rtr[0..len]
end
(0..1_000_000).each{|_| puts rand_string(20)}' | uniq -c | awk '$1>1'
# no output so no duplicates
real 0m12.954s
user 0m13.441s
sys 0m0.217s
The increase in time for the Ruby is likely the remainder of the pipe which is the same for the awk. So the awk is a bit faster…