Greetings,
I have been having some problems with server crashes. On two occasions I was able to have personnel at the co-location facility, where my server lives, look at the console immediately after a crash. The kernel version running was 2.0.32 w/ SMP support on a dual Pentium Pro box. When the server would crash, a message would be continuously displayed on the console (but not in the syslog): Aiee: scheduling in interrupt: 0012BBD1 A search of the sources found that this condition was tested for in /usr/src/linux/sched.c on line 396 and the message printed on line 497. It would appear that an interrupt was encountered during the schedule() operation. This would be a bad thing. (It's not nice to re-enter the scheduler via an interrupt) Since the address being printed is, presumably, the return address after the schedule call, and is consistent, I am assuming that the scheduler is being re-entered while servicing some sort of interrupt from within the same ISR. First, are my assumptions even close to reality? Secondly, is this a "known" issue with the 2.0.32 kernel. I understand there have been some changes in the kernel SMP code between 2.0.32 and 2.0.33 so I am wondering if upgrading the kernel will fix this. Thirdly, does this indicate some sort of hardware failure and if so, how can I trace this back to the device in question. Finally, I am open to suggestions for other ideas and/or options here. As always, any help is appreciated. Most suggestions taken seriously :) Thanks, in advace, Steve
|