I have distilled
the messy code from yesterday into
the minimal chunk of code that will cause bizarre multi-thread behavior on SBCL 1.0.33
(defun compute-thread (thread-num rows-per-proc min max mul)
;; a pretty straightforward computation
(let* ((local-min (+ min (* thread-num rows-per-proc)))
(local-max (1- (+ local-min rows-per-proc)))
(local-count 0))
(loop for i from local-min upto local-max
do (loop for j from min upto max
do (incf local-count
(let ((xc (* mul i)))
(loop for count from 0 below 100
do (when (>= xc 4)
(return count))
(incf xc)
finally (return 100)))
)))
#+nil(format *out* "Thread ~a local-min=~a, local-max=~a local-count=~d~%"
thread-num local-min local-max local-count)
local-count))
(defun main (num-threads)
;; spawn some threads and sum the results
(loop with rows-per-proc = (/ 100 num-threads)
for thread in
(loop for thread-num from 0 below num-threads
collect
(let ((thread-num thread-num));establish private binding of thread-num for closure
(sb-thread:make-thread (lambda ()
(compute-thread thread-num rows-per-proc -250 250 0.008d0)))))
summing (sb-thread:join-thread thread)))
(defun test (num-threads num-iterations expected-val)
;; this is just a test wrapper which tests that the result of main is consistent
(loop for i from 0 below num-iterations do
(format t "Run ~a:" i)
(let ((result (main num-threads)))
(format t "result=~a~%" result)
(assert (= expected-val result)))))
I expect:
CL-USER> (test 10 1000 (main 1))
to complete without assertions, because the result of
(main num-threads) should always be the same as the result of
(main 1)
Unfortunately:
CL-USER> (test 10 1000 (main 1)) ;; test 10 threads
Run 0:result=300600
Run 1:result=300600
Run 2:result=300600
Run 3:result=300600
.
.
.
Run 494:result=300600
Run 495:result=300600
Run 496:result=300600
Run 497:result=300600
Run 498:result=300601 ;;<---- Oh no!!!
; Evaluation aborted.
Arghh... ;-(
As a sanity check:
(test 1 1000 (main 1)) completes without problems -- in other words, 1 thread always seems to computes the same answer, so it seems to be a multi-thread problem.
No comments:
Post a Comment