Last Comment Bug 165586 - pthreads: PR_Interrupt may not interrupt PR_WaitCondVar (the join test hangs)
: pthreads: PR_Interrupt may not interrupt PR_WaitCondVar (the join test hangs)
Status: NEW
: qa
:
Product: NSPR
NSPR
: 4.2
: x86 Linux
: P2 normal (vote)
: ---
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2002-08-29 16:08 PDT by
Modified: 2007-05-08 17:40 PDT (History)

1 user (edit)
  • crumley@fields.space.umn.edu


See Also:
blocking-fennec: ---
blocking2.0: ---
status2.0: ---
blocking1.9.2: ---
status1.9.2: ---
blocking1.9.1: ---
status1.9.1: ---


Attachments
For testing: a patch for NSPR that makes it easy to reproduce the bug with the join test (1.14 KB, patch)
2002-08-29 16:32 PDT,
no flags Details | Diff

[reply] [-] Description 2002-08-29 16:08:33 PDT
This bug affects the pthreads version of NSPR, which
is used on most Unix platforms.

There is a race condition when we use PR_Interrupt to
interrupt PR_WaitCondVar.

Suppose thread A is calling PR_WaitCondVar and thread
B is interrupting thread A.  The following event
sequence is problematic.

Thread A                     Thread B
==========================   ========================
Test its interrupt flag

Set thred->waiting to cvar

                             Set thread A's interrupt
                             flag

                             Call pthread_cond_broadcast
                             on thread A's 'waiting'
                             cvar

Call pthread_cond_wait

=====================================================

Thread A misses the broadcast and blocks in
pthread_cond_wait forever.

This can be reproduced with the 'join' test program,
at least on Red Hat Linux 6.2.
[reply] [-] Comment 1 2002-08-29 16:32:05 PDT
Created attachment 97240 [details]
For testing: a patch for NSPR that makes it easy to reproduce the bug with the join test

Apply the patch to mozilla/nsprpub.  It inserts
a 2 second delay to PR_WaitCondVar after it sets
thred->waiting to cvar and inserts a 1 second delay
to the very beginning of PR_Interrupt.

With the patched NSPR library, run the 'join' test.
The events will happen at the following time instants:

Thread A		      Thread B
===========================   ========================
T0: Test its interrupt flag   T0: Sleep 1 second

T0: Set thred->waiting to
cvar

T0: Sleep 2 seconds
			      
			      T1: Set thread A's interrupt
			      flag

			      T1: Call pthread_cond_broadcast
			      on thread A's 'waiting' cvar

T2: Call pthread_cond_wait
[reply] [-] Comment 2 2002-10-05 07:18:37 PDT
This bug can also be reproduced rather easily
with the 'join' test program on Mac OS X.
[reply] [-] Comment 3 2003-06-02 14:32:53 PDT
I am seeing a problem which may be related to this bug, though I don't know
enough about threads to be sure.  

I am getting frequents hangs of Mozilla where it will not respond at all if I
open up several pages in new tabs in the background in quick succession. I
first noticed this problem with Mozilla 1.4 beta, but I have now reproduced it
with 1.4rc and 1.0.0 under Linux as well.

Here's a copy of the the backtrace I get after I get this hang from 1.4rc:
#0  0x403b6ae2 in sigsuspend () from /lib/libc.so.6
#1  0x400d6f35 in __pthread_wait_for_restart_signal ()
   from /lib/libpthread.so.0
#2  0x400d3f05 in pthread_cond_wait () from /lib/libpthread.so.0
#3  0x400af15e in PR_WaitCondVar () from /usr/local/mozilla/libnspr4.so
#4  0x4057c8ee in nsThreadPool::GetRequest ()
   from /usr/local/mozilla/libxpcom.so
#5  0x4057d060 in nsThreadPoolRunnable::Run ()
   from /usr/local/mozilla/libxpcom.so
#6  0x4057b9bd in nsThread::Main () from /usr/local/mozilla/libxpcom.so
#7  0x400b42b9 in PR_Select () from /usr/local/mozilla/libnspr4.so
#8  0x400d4d53 in pthread_start_thread () from /lib/libpthread.so.0
#9  0x400d4d99 in pthread_start_thread_event () from /lib/libpthread.so.0

The backtrace is essentially the same for 1.0.0.

I am not sure what I have changed about my system that has caused this problem
to only startup recently - within the past month.  My guess are libc6 or the
kernel.  I am now running Linux kernel 2.5.69 and libc6 2.3.1-16 from Debian
testing.

Since the backtrace involved pthreads and the interrupt and wait conditions
mentioned in this bug, it seems that my problem might be related.  If you don't
think so, let me know and I'll open up a new bug.
[reply] [-] Comment 4 2003-06-02 16:53:16 PDT
You should open a new bug.  PR_Interrupt is
only used when shutting down a nsThreadPool.
I am not sure if PR_Interrupt is involved in
the hang you are seeing.

:

Status:
NEW