Control of simulated annealing parameters needed in
logreg
.
logreg.anneal.control(start=0, end=0, iter=0, earlyout=0, update=0)
A list with arguments start
, end
, iter
,
earlyout
, and update
, that can be used as the value of
the argument anneal.control
oflogreg
.
the upper temperature (on a log10 scale) in the annealing
chain. I.e. if start = 3
, the annealing chain starts at
temperature 1000. The acceptance function is the usual
min(1,exp(-diff(scores)/temp))
, so any temperature larger than what
would be expected as possible differences between any two models
pretty much generates a random walk in the beginning, and means that
you need to wait longer on results. A too low starting temperature
means that the chain may end up in a local optimal (rather than global
optimal) solution. If you select both start
and end
the
default of 0, the program will attempt to find reasonable numbers
itself (it is known to be only moderately successful in this though).
the lower temperature (on a log10 scale) in the annealing
chain. I.e. if end
is -2, the annealing chain ends at
temperature 0.01. If this temperature is very low one can use the
early out possibility listed below, as otherwise the chain may run
longer than desired!
the total number of iterations in the annealing chain. This is the total over all annealing chains, not the number of iterations of a chain at a given temperature. If this number is too small the chain may not find a good (the best) solution, if the chain is too long the program may take long...
if the end
temperature is very low, the simulated
annealing algorithm may not move any more, but one still needs to wait
on all possible moves being evaluated (and rejected)! An early out
possibility is offered. If during consecutive five blocks of
earlyout
iterations, in each block 10 or fewer moves are
accepted (for which the score changes), the program terminates. This
is a desirable option after one is convinced the program otherwise
runs fine: it can be dangerous on the first run.
every how many iterations there should be an update of
the scores. I.e. if update = 1000
, a score will get printed
every 1000 iterations. So if iter = 100000
iterations, there
will be 100 updates on your screen. If you update = 0
, a one
line summary for each fitted model is printed. If update = -1
,
there is virtually no printed output.
Ingo Ruczinski ingo@jhu.edu and Charles Kooperberg clk@fredhutch.org.
Missing arguments take defaults. If the argument start
is a
list with arguments start
, end
, iter
,
earlyout
, and update
, those values take precedent of
directly specified values.
This is a rough outline how the automated simulated annealing works: The algorithm starts running at a very high temperature, and decreases the temperature until the acceptance ratio of moves is below a certain threshold (in the neighborhood of 95%). At this point we run longer chains at fixed temperatures, and stop the search when the last "n" consecutive moves have been rejected. If you think that the search was either not sufficiently long or excessively long (both of which can very well happen since it is pretty much impossible to specify default values that are appropriate for all sorts of data and models), you can over-write the default values.
If you want more detailed information continue reading....
These are some more detailed suggestions on how to set the parameters
for the beginning temperature, end temperature and number of
iterations for the Logic Regression routine. Note that if start
temperature and end
temperature are both zero, the routine uses
its default values. The number of iterations iter
is irrelevant
in this case. In our opinion, the default values are OK, but not
great, and you can usually do better if you're willing to invest time
in learning how to set the parameters.
The starting temperature is the log(10) value of start
-
i.e., if start
is 2
it means iterations start at a temperature of 100. The
end
temperature is again the log(10) value. The number of iterations
are equidistant on a log-scale.
Considerations in setting these parameters.....
1) start
temperature. If this is too high you're "wasting time", as
the algorithm is effectively just making a random walk at high
temperatures. If the starting temperature is too low, you may already
be in a (too) localized region of the search space, and never reach a
good solution. Typically a starting temperature that gives you 90%
or so acceptances (ignoring the rejected attempts, see below) is
good. Better a bit too high than too low. But don't waste too much
time.
2) end
temperature. By the time that you reach the
end
temperature the
number of accepted iterations should be only a few per 1000, and the
best score should no longer change. Even zero acceptances is fine. If
there are many more acceptances, lower end
. If there
are zero acceptances for many cycles in a row, raise it a bit. You can
set a lower end
temperature than needed using the earlyout
test: if
in 5 consecutive cycles of 1000 iterations there are fewer than a
specified number of acceptances per cycle, the program terminates.
3) number of iterations. What really counts is the number of iterations in the "crunch time", when the number of acceptances is, say, more than 5% but fewer than 40% of the iterations. If you print summary statistics in blocks of 1000, you want to see as many blocks with such acceptance numbers as possible. Obviously within what is reasonable.
Here are two examples, with my analysis....
(A) logreg.anneal.control(start = 2, end = 1, iter = 50000, update = 1000)
The first few lines are (cutting of some of the last columns...)
log-temp | current score | best score | acc / | rej / | sing | current parameters |
2.000 | 1198.785 | 1198.785 | 0 | 0 | 0 | 0.508 -0.368 -0.144 |
1.980 | 1197.962 | 1175.311 | 719(18) | 34 | 229 | 1.273 -0.275 -0.109 |
1.960 | 1197.909 | 1168.159 | 722(11) | 38 | 229 | 0.416 -0.345 -0.173 |
1.940 | 1181.545 | 1168.159 | 715(19) | 35 | 231 | 0.416 -0.345 -0.173 |
... | ||||||
1.020 | 1198.258 | 1167.578 | 663(16) | 128 | 193 | 1.685 -0.216 -0.024 |
1.000 | 1198.756 | 1167.578 | 641(23) | 104 | 232 | 1.685 -0.216 -0.024 |
1.000 | 1198.756 | 1167.578 | 1( 0) | 0 | 0 | 1.685 -0.216 -0.024 |
Ignore the last line. This one is just showing a refitting of the best
model. Otherwise, this suggests
(i) end
is ***way*** too high, as there are still have
more than 600 acceptances in blocks of 1000. It is hard to judge what
end
should be from this run.
(ii) The initial number of acceptances is really high
(719+18)/(719+18+34))=95%
- but when 1.00
is reached it's at about
85%. One could change start
to 1, or keep it at 2 and play it save.
(B) logreg.anneal.control(start = 2, end = -2, iter = 50000, update = 1000)
- different dataset/problem
The first few lines are
log-temp | current score | best score | acc / | rej / | sing | current parameters |
2.000 | 1198.785 | 1198.785 | 0( 0) | 0 | 0 | 0.50847 -0.36814 |
1.918 | 1189.951 | 1172.615 | 634(23) | 22 | 322 | 0.38163 -0.28031 |
1.837 | 1191.542 | 1166.739 | 651(24) | 32 | 293 | 1.75646 -0.22451 |
1.755 | 1191.907 | 1162.902 | 613(30) | 20 | 337 | 1.80210 -0.32276 |
The last few are
log-temp | current score | best score | acc / | rej / | sing | current parameters |
-1.837 | 1132.731 | 1131.866 | 0(18) | 701 | 281 | 0.00513 -0.45994 |
-1.918 | 1132.731 | 1131.866 | 0(25) | 676 | 299 | 0.00513 -0.45994 |
-2.000 | 1132.731 | 1131.866 | 0(17) | 718 | 265 | 0.00513 -0.45994 |
-2.000 | 1132.731 | 1131.866 | 0( 0) | 0 | 1 | 0.00513 -0.45994 |
But there really weren't any acceptances since
log-temp | current score | best score | acc / | rej / | sing | current parameters |
-0.449 | 1133.622 | 1131.866 | 4(21) | 875 | 100 | 0.00513 -0.45994 |
-0.531 | 1133.622 | 1131.866 | 0(19) | 829 | 152 | 0.00513 -0.45994 |
-0.612 | 1133.622 | 1131.866 | 0(33) | 808 | 159 | 0.00513 -0.45994 |
Going down from 400 to fewer than 10 acceptances went pretty fast....
log-temp | current score | best score | acc / | rej / | sing | current parameters |
0.776 | 1182.156 | 1156.354 | 464(31) | 258 | 247 | 1.00543 -0.26602 |
0.694 | 1168.504 | 1150.931 | 306(17) | 355 | 322 | 1.56695 -0.43351 |
0.612 | 1167.747 | 1150.931 | 230(38) | 383 | 349 | 1.56695 -0.43351 |
0.531 | 1162.085 | 1145.920 | 124(12) | 571 | 293 | 1.15376 -0.15223 |
0.449 | 1143.841 | 1142.321 | 63(20) | 590 | 327 | 2.20150 -0.43795 |
0.367 | 1176.152 | 1142.321 | 106(21) | 649 | 224 | 2.20150 -0.43795 |
0.286 | 1138.384 | 1131.866 | 62(18) | 731 | 189 | 0.00513 -0.45994 |
0.204 | 1138.224 | 1131.866 | 11(27) | 823 | 139 | 0.00513 -0.45994 |
0.122 | 1150.370 | 1131.866 | 15(12) | 722 | 251 | 0.00513 -0.45994 |
0.041 | 1144.536 | 1131.866 | 30(19) | 789 | 162 | 0.00513 -0.45994 |
-0.041 | 1137.898 | 1131.866 | 21(25) | 911 | 43 | 0.00513 -0.45994 |
-0.122 | 1139.403 | 1131.866 | 12(30) | 883 | 75 | 0.00513 -0.45994 |
What does this tell me -
(i) start
was probably a bit high - no real harm
done,
(ii) end
was lower than needed. Since there really
weren't any acceptances after 10log(T) was about (-0.5
), an ending
log-temperature of (-1
) would have been fine,
(iii) there were far too few runs. The crunch time didn't take more
than about 10 cycles (10000 iterations). You see that this is the time
the "best model" decreased quite a bit - from 1156 to 1131. I would
want to spend considerably more than 10000 iterations during this
period for a larger problem (how many depends very much on the size of
the problem). So, I'd pick (A)logreg.anneal.control(start = 2,
end = -1, iter = 200000, update = 5000)
. Since the total range is
reduced from 2-(-2)=4
to 2-(-1)=3
, over a range of 10log temperatures
of 1 there will be 200000/3=67000
rather than 50000/4=12500
iterations. I would repeat this run a couple of times.
In general I may sometimes run several models, and check the scores of the best models. If those are all the same, I'm very happy, if they're similar but not identical, it's OK, though I may run one or two longer chains. If they're very different, something is wrong. For the permutation test and cross-validation I am usually less picky on convergence.
Ruczinski I, Kooperberg C, LeBlanc ML (2003). Logic Regression, Journal of Computational and Graphical Statistics, 12, 475-511.
Ruczinski I, Kooperberg C, LeBlanc ML (2002). Logic Regression - methods and software. Proceedings of the MSRI workshop on Nonlinear Estimation and Classification (Eds: D. Denison, M. Hansen, C. Holmes, B. Mallick, B. Yu), Springer: New York, 333-344.
Selected chapters from the dissertation of Ingo Ruczinski, available from https://research.fredhutch.org/content/dam/stripe/kooperberg/ingophd-logic.pdf
logreg
,
logreg.mc.control
,
logreg.tree.control
myannealcontrol <- logreg.anneal.control(start = 2, end = -2, iter = 50000, update = 1000)
Run the code above in your browser using DataLab