Before I’m flamed, I’m just a beginner in programming. I have been learning Python for about 1 week by myself.
I have a real problem to solve:
Goal: The fastest route through a pathway.
Problem:
- Walk forward to get to the goal. The pathway is 128 steps.
- Walking 1 step increases the danger value by 113
- If the danger value is over the limit then there is a fight. The fight takes 15 seconds to win.
- The limit is randomly assigned on each step.
- The danger value resets to 0 after a fight.
Here’s the essence of the problem though:
There are two possible options if there is a fight:
- The fight can be escaped. Escaping takes 2 seconds. But does NOT reset the danger value.
- Accept the fight. Resets the danger value.
- The higher the danger value, the higher the chance of getting a fight.
I’ve written the code to take one step. And I’ve written the code for the relevant randomness of the danger value, etc.
I’ve even made a graph that shows exactly the pathway and potential fights, with the respective limits. This graph shows what happens when you accept every fight (don’t escape):
Being self-taught, I just don’t know how to write a search algorithm for finding the quickest way through a particular route. I believe I need to write a backtracking algorithm; essentially a kind of brute force.
I just don’t know how to translate it from english into code. It’s been around 4 days of just thinking about this, and I cannot write anything that expresses the even attempts to solve the problem.
Any books or advice on how to even start solving this?
As I’ve said, I have all of the code for what happens on each step (here: http://pastebin.com/LnV4h9NV).
I just don’t know how to even begin trying to brute force/search.
3
This is the problem as far as I can see it:
- Each field has a random, unknown “danger level” ℓi that is evenly distributed across a certain interval [ℓmin, ℓmax]. So on average, the level is ℓ = (ℓmin + ℓmax)/2.
- Moving to the next field has a fixed cost Δcm.
- Each move increases the “danger” d by Δd.
-
If d > ℓi after a move, then two options can be taken:
- The total cost may increase by Δcr = 2 (running away), or
- The total cost may increase by Δcf = 15 (fighting), in which case d is reset to zero.
The probability p(F) that d > ℓi, i.e. that a fight occurs, is 0 if d < ℓmin, else (d – ℓmin)/(ℓmax – ℓmin).
So our problem has the fixed parameters ℓmin, ℓmax, Δd, Δcm, Δcr, Δcf and the state variables c and d. We have the following possible state transitions:
-
No fight: With a probability of 1-p(F), nothing happens. then:
c ← c + Δcm
d ← d + Δd -
A fight: With a probablity of p(F), we can choose the options
- c ← c + Δcm + Δcr
d ← d + Δd - c ← c + Δcm + Δcf
d ← 0
- c ← c + Δcm + Δcr
Your questions seems to be how the total cost c can be minimized. In this statistical model, there is no definitive c. Instead, we will have to minimize the expected value for c. We can brute-force this by examining a decision tree. Each tree has can be assigned an expected cost, which can be calculated from the subtrees like
c = min(cF, f, cF, r) · p(F) + c¬F · (1-p(F))
where cF, f is the expected cost when fighting a fight at these conditions etc..
The following pseudocode gets the expected cost given a certain state:
function expected_cost(i, c, d) {
return c if i == 0;
p = probability_of_fight(d);
c_no_fight = (1-p) * expected_cost(i - 1, c + Δc_m, d + Δd);
c_fight_run, c_fight_fight = 0, 0;
if (p > 0) {
c_fight_run = p * expected_cost(i - 1, c + Δc_m + Δc_r, d + Δd);
c_fight_fight = p * expected_cost(i - 1, c + Δc_m + Δc_f, 0 );
}
return c_no_fight + min(c_fight_run, c_fight_fight);
}
Now all that has to be added is keeping track of which decisions were made in the winning paths – note that there are multiple winning paths, because either a fight may occur, or may not at each field. The decision we keep track of is whether the fight is carried out, or fled. Interesting analyses might be how often a fight is fled or fought – in this case, returning further values with the respective counts seems like the thing to do. Otherwise, inspecting a tree with the decision histories might be interesting.
Care should be taken to run that with very small i, as the tree can get very large (up to 3i). Memoization will help prevent some unneccessary calculations.
Depending on how you code the above algorithm, runtime can vary extremely. Below is a Perl script I used, which uses extensive caching to achieve a runtime in the order of seconds (if the current cost is a key in the cache as well, runtime is in the order of minutes). It performs an example analysis of counting how often running away and fighting is more beneficial. With those parameters, commiting to a fight isn’t a viable strategy before the game is 123 fields long.
use strict;
use warnings;
no warnings qw/recursion/;
use feature qw/state/;
use utf8;
use Math::BigInt; # not neccessary. alternatively: no warnings 'imprecision'
# parameters as explained in answer
my ($ℓ_min, $ℓ_max, $Δd, $Δc_m, $Δc_r, $Δc_f) =
(0xFF, 0xFF**2,113, 1, 2, 15 );
my ($steps) = @ARGV or die <<USAGE;
usage:
$0 iterations
parameters:
iterations -- the number of fields to traverse
USAGE
my ($cost, $fight, $run) = expected_cost($steps, 0);
printf "expected cost for i=%d steps is %gn",
$steps, $cost;
printf "fight won %.3e timesn",
$fight;
printf "run won %.3e timesn",
$run;
printf "fight:run relation: %sn",
($run > $fight) ? sprintf('1:%.3e', $run/$fight) : sprintf('%.3e:1', $fight/$run);
sub probability_of_fight {
my ($d) = @_;
return $d < $ℓ_min ? 0 : ($d - $ℓ_min)/($ℓ_max - $ℓ_min);
}
# Takes number $i of fields left and accumulated danger $d
# Return expected cost for that subtree, plus favourable fights and fleeings.
sub expected_cost {
my ($i, $d) = @_;
return 0, 0, 0 if $i == 0;
state $cache = [];
my $retval = $cache->[$i][$d] //= do {
my $p = probability_of_fight($d);
# variable prefixes:
# nf -- no fight
# f -- fight
# fr -- fight, but run away
# ff -- fight, and carry out the fight
my ($nf_c, $nf_fight, $nf_run) = expected_cost($i - 1, $d + $Δd);
my $f_c = 0;
my ($f_fight, $f_run) = (Math::BigInt->new, Math::BigInt->new);
if ($p > 0) {
my ($fr_c, $fr_fight, $fr_run) = expected_cost($i - 1, $d + $Δd);
$fr_c += $Δc_r;
my ($ff_c, $ff_fight, $ff_run) = expected_cost($i - 1, 0 );
$ff_c += $Δc_f;
if ($ff_c < $fr_c) {
# fighting is cheaper
$f_c = $ff_c;
$f_fight = $ff_fight + 1;
$f_run = $ff_run;
}
elsif ($ff_c > $fr_c) {
# running away is cheaper
$f_c = $fr_c;
$f_fight = $fr_fight;
$f_run = $fr_run + 1;
}
else {
# both are equivalent
$f_c = $ff_c;
$f_fight = $ff_fight + $fr_fight;
$f_run = $ff_run + $fr_run;
}
}
my $expected_cost = $Δc_m + (1-$p) * $nf_c + $p * $f_c;
[$expected_cost, $nf_fight + $f_fight, $nf_run + $f_run];
};
return @$retval;
}
Example session:
$ perl decide.pl 128
expected cost for i=128 steps is 154.27
fight won 2.432e+17 times
run won 5.317e+36 times
fight:run relation: 1:2.186e+19
… where “won” means “occurs this often in the best decision tree”. This roughly translates to a probability of which decision is likely to be cheaper.
I had some time for further analysis. For each field, we can calculate if, when a fight occurs at this position, we should rather flee or fight. I plotted the result below:
… which uses the parameters from the above script.
How to read this graph: The x-axis is the distance from the end of the field, i.e. the number of steps (and thus fights) that can still occur. The y-axis is the danger level you have accumulated. If a player has to fight, he can find his distance-danger coordinate. If that coordinate is red, fighting will have the least cost. If the coordinate is colored blue, then fleeing is probably cheaper.
We can see that for example, during the last 7 steps fighting is never the best choice (the graph is already maxed out). Otherwise, the preference varies periodically. Fighting is most likely to be interesting at around 70–60 steps away from the end.
2