I am looking for an efficient way of evaluating an array of strings against a series of rules and then adding them to another array.
For example I have an array of strings:
- something
- a
- 1234
- #gohere
I want to iterate through all the items in the array and evaluate them based on a series of rules. For example:
- Strings greater that 1 in length
- Strings that don’t start with the # character
I was thinking of either:
- Nested loop of if statements evaluating the string for each rule and if they all pass then add to another array
- Evaluating the string against rule 1 then adding to the array then performing another evaluation – I have discounted this since it requires lots of array read/write operations to evaluate the complete ruleset.
Please note that I have kept the number of rules to two for brevity however I could be needing up to 10 rules. Additionally, I do not control the original array since the function I am calling returns an array of strings.
I know I may have answered my own question with the nested loop however I wondered if anyone had any ideas for optimizing this type of situation.
Also, I know I asked this question in the reference of Ruby but any pseudo-code would suffice.
I would use Array#select and pass it a block that &&s together all the tests. If the tests get unwieldy you can abstract them into a method or series of methods.
2
Make each rule a lambda, such as this one that returns true if the string length is greater than 1:
->(s) {s.length > 1}
Make an array of all of the rules that a string must pass for it to match. Unless the list of rules changes, make it a constant:
RULES = [
->(s) {s.length > 1},
->(s) {s =~ /^#/},
]
If any of the rules are complicated, give them their own constant, the name of the constant serving as a comment. As with @sebastiangeiger’s answer, order the rules by probability of failure. You want to fail as early as possible, if speed is essential.
To see if a string matches, use the “any?” predicate:
RULES.all? do |rule|
rule.call(s)
end
Enumerable#all? short-curcuits, so the first rule that returns false will prevent the remaining rules from being evaluated for that string.
Now, to find all strings which match all rules:
matching_strings = strings.select do |string|
RULES.all? do |rule|
rule.call(string)
end
end
1
I would suspect that a for loop
or a select
are equal in terms of efficiency. You can always implement both options and benchmark them.
What you can optimise is the body of the iteration. If you have multiple rules and you join those together using and
ruby will do short-circuit evaluation and return after it found the first false
value. Hence the order of those rules is important.
Now you need to find out:
- Which rule is most likely to return
false
- Which rule will evaluate quickest
Order the rules from fastest to slowest and/or most likely to fail to least likely to fail. Without a specific problem however I can’t give you a more specific answer. And as with everything in optimisation, always measure first.
Totally contrived example
require 'benchmark'
def slow_and_false
sleep 0.1
false
end
def fast_and_false
sleep 0.005
false
end
Benchmark.bm do |x|
x.report("slow first") do
100.times do
slow_and_false and fast_and_false
end
end
x.report("fast first") do
100.times do
fast_and_false and slow_and_false
end
end
end
The result of this:
user system total real
slow first 0.000000 0.000000 0.000000 ( 10.086344)
fast first 0.010000 0.010000 0.020000 ( 0.560307)
1
I gave the answer below to a similar question at Stack Overflow:
module Checks
def _invalid_permission() nil end
def _check_another_error() "oops!" end
def _and_another_one() nil end
end
class Doit
@@checks = Checks.instance_methods(false)
include Checks
def doit
@@checks.each { |m| rv = send(m); return rv if rv }
"test"
end
end
p Doit.new.doit # => "oops!"
If you change
def _check_another_error() "oops!" end
to
def _check_another_error() nil end
then
p Doit.new.doit # => "test"
Obviously, the check methods can be called with arguments.
This approach allows you to add, remove or rename a check method without having to remember to change references to it elsewhere.