Saturday, December 19, 2009

A SBCL multi-threaded computation mystery

Update: 2009/12/20 -- this is obsolete -- I now have a a much cleaner, shorter, piece of code which replicates the bizarre behavior below, in today's post. I ran into this thread-related mystery a while ago, forgot all about it, and then ran into the problem again today on a similar chunk of code and wasted a bunch of time, so I'll put it up here. Maybe somebody with experience in SBCL+threads is reading and could give me a hint ;-) Here is a multithreaded version of the simple Mandelbrot benchmark which I earlier parallelized using CL-MPI:
(defconstant +resolution+ 500)
(defconstant +iterations+ 100)
(defparameter *start-time* nil )  
(defparameter *out* *standard-output*)

(defun iters (max xc yc)
  (let ((x xc)
        (y yc))
    (declare (double-float x y)
             (fixnum max))
    (loop for count from 0 below max
          do (when (>= (+ (* x x) (* y y)) 4.0d0)
               (return-from iters count))
             (let ((tmp (+ (- (* x x) (* y y)) xc)))
               (setf y (+ (* 2.0d0 x y) yc)
                     x tmp)))
    max))

(defun compute-thread (thread-num rows-per-proc min max mul)
  (declare (fixnum thread-num rows-per-proc min max))
  (declare (type double-float  mul))
  (let* ((local-min (+ min (the fixnum (* thread-num rows-per-proc))))
     (local-max (1- (+ local-min rows-per-proc)))
     (local-count 0))
    (declare (fixnum local-min local-max local-count))
    (loop for i fixnum from local-min upto local-max
      do (loop for j fixnum from min upto max
           do (incf local-count
               #+call-func (iters +iterations+ (* mul i) (* mul j))
               #-call-func
               (let* ((xc (* mul i))
                  (yc (* mul j))
                  (x xc)
                  (y yc))
                  (declare (double-float x y))
                  (loop for count from 0 below +iterations+
                          do (cond((>= (+ (* x x) (* y y)) 4.0d0)
                                          (return count))
                                        (t
                                         (let ((tmp (+ (- (* x x) (* y y)) xc)))
                                            (setf y (+ (* 2.0d0 x y) yc)
                                                  x tmp))))
                        finally (return +iterations+)))
               )))
    (format *out* "Thread ~a local-min=~a, local-max=~a local-count=~d time ~,4f~%"
        thread-num local-min local-max local-count
        (float (/ (- (get-internal-run-time) *start-time*) internal-time-units-per-second)))
    local-count))

(defun mandelbrot-main (num-threads)
  "straightforward parallelization which partitions the work into equal regions,
   and each region is assigned to a process."
  (declare (fixnum num-threads))
  (format t "---~%")
  (setf *start-time* (get-internal-run-time))
  (let* ((max (truncate +resolution+ 2))
         (min (- max))
         (mul (/ 2.0d0 max)))
    ;; spawn computation threads
    (loop with rows-per-proc = (/ +resolution+ num-threads)
      for thread in
      (loop for thread-num from 0 below num-threads
        collect
        (let ((thread-num thread-num)); needed to establish private binding for closure
          (sb-thread:make-thread (lambda ()
                       (compute-thread thread-num rows-per-proc min max mul)))))
      summing (sb-thread:join-thread thread))))

Note that in compute-thread, there is a conditionally compiled chunk of code: if *features* contains :call-func, the function iters is called. However, if :call-func is not in *features*, then instead, I compile a chunk of code which performs the exact same computation as the iters function. This code should behave identically regardless of #+call-func or #-call-func, right? Wrong -- apparently... ;-( The mandelbrot-main function returns the number of points which are determined to be in the Mandelbrot set, so here is a test function to verify that the computation is consistent and always computes the same, expected value:
(defun test (num-threads num-iterations expected-val)
  (loop for i from 0 below num-iterations do
    (format t "Run ~a:" i)
    (let ((result (mandelbrot-main num-threads)))
      (format t "result=~a~%" result)
      (assert (=  expected-val result)))))
For +resolutions+ = 500 and +iterations+ = 100, the result should be 2905274. When I compiled with #-call-func, (test 10 1000 2905274) completes without asserting an inconsistenty. However, when compiled with #+call-func, (test 10 1000 2905274) asserts an inconsistency after some non-deterministic number of repetitions. e.g.,
CL-USER> (test 10 1000 2905274)
Run 0:---
Thread 1 local-min=-200, local-max=-151 local-count=103224 time 0.0000
Thread 0 local-min=-250, local-max=-201 local-count=29924 time 0.0000
Thread 3 local-min=-100, local-max=-51 local-count=596048 time 0.0120
Thread 2 local-min=-150, local-max=-101 local-count=364384 time 0.0200
Thread 4 local-min=-50, local-max=-1 local-count=978228 time 0.0400
Thread 6 local-min=50, local-max=99 local-count=52917 time 0.0400
Thread 7 local-min=100, local-max=149 local-count=24513 time 0.0400Thread
8 local-min=150, local-max=199 local-count=17788 time 0.0400
Thread 9 local-min=200, local-max=249 local-count=10354 time 0.0440
Thread 5 local-min=0, local-max=49 local-count=727894 time 0.0480
result=2905274
.
. 
.
Run 146:---
Thread 0 local-min=-250, local-max=-201 local-count=29924 time 0.0000
Thread 1 local-min=-200, local-max=-151 local-count=103224 time 0.0000
Thread 2 local-min=-150, local-max=-101 local-count=364384 time 0.0120
Thread 3 local-min=-100, local-max=-51 local-count=596048 time 0.0200
Thread 4 local-min=-50, local-max=-1 local-count=978228 time 0.0240
Thread 7 local-min=100, local-max=149 local-count=24513 time 0.0240
Thread 6 local-min=50, local-max=99 local-count=52917 time 0.0280
Thread 8 local-min=150, local-max=199 local-count=17788 time 0.0360
Thread 9 local-min=200, local-max=249 local-count=10354 time 0.0360
Thread 5 local-min=0, local-max=49 local-count=727894 time 0.0440
result=2905274
Run 147:---
Thread 0 local-min=-250, local-max=-201 local-count=29924 time 0.0080
Thread 1 local-min=-200, local-max=-151 local-count=103224 time 0.0080
Thread 2 local-min=-150, local-max=-101 local-count=364384 time 0.0120
Thread 8 local-min=150, local-max=199 local-count=17788 time 0.0280
Thread 6 local-min=50, local-max=99 local-count=52917 time 0.0360
Thread 3 local-min=-100, local-max=-51 local-count=596147 time 0.0360
Thread 9 local-min=200, local-max=249 local-count=10354 time 0.0360
Thread 7 local-min=100, local-max=149 local-count=24513 time 0.0360
Thread 5 local-min=0, local-max=49 local-count=727894 time 0.0360
Thread 4 local-min=-50, local-max=-1 local-count=978228 time 0.0400
result=2905373    <--- Uh, oh!!!!!!!!!!!
.
. and SLIME complains:
The assertion (= EXPECTED-VAL RESULT) failed.
Yikes! Looking closer, it looks like Thread 3 has a discrepancy: In the "correct" Runs (i.e., for the first 146 Runs), the output from Thread 3 was:
Thread 3 local-min=-100, local-max=-51 local-count=596048 time 0.0200
but on the 147th Run, the output is:
Thread 3 local-min=-100, local-max=-51 local-count=596147 time 0.0360
I have absolutely no idea what could be causing this. The most baffling thing is that when compiled with #-call-func , there is no problem -- that is, calling the iter function instead of executing an inlined version of the same computation seems to be responsible for causing this problem! Very, very bizarre because the iter function and its caller don't touch any special variables or do anything else which looks unsafe to me. In addition, there's no problem if I compile with #+call-func but running but only use 1 thread
(test 1 1000 2905274)
. If I use more than 1 thread, the inconsistency eventually shows up (the more threads, the earlier the problem shows up..). So, this must be something related to multithreading. Is it possible that this is a bug in SBCL, or am I making some silly thread-related mistake here? (I'm running this with SBCL1.0.33 and SBCL1.0.32, x86-64)

No comments:

Post a Comment