2007.06.07 - 08:21:16 PDT
I've spent the last day and a half intermittantly trying to track down a bug I was hitting when restarting apache2. Buckle up, this is going to be a long problem explanation.
Basically, I was seeing incredibly slow startup times of apache. It could be from 30 seconds up to 5 minutes or more. And it wasn't just starting apache; it was slow stopping apache (-k stop), starting apache (-k start), and even testing the config (-t).
That's weird, but it gets weirder. This problem would only randomly (pun intended; read on) appear. Apache would restart just fine, and then I'd restart it again later and it would hang. Even the problem itself didn't remain the same. Some times it would get through configtest ok and hang at stop and start (or graceful). Sometimes it would get through configtest and stop and then hang on start. It also hung two places on starting. First it would block and hang immediately, then it would background and open a listen socket on 80 but block and hang before it started worker processes. You could connect to the server and send requests, but there you'd sit with no workers to service you. Now randomly mash together some of these places where it could hang and that's what I'd see. And not consistently; sometimes it was just fine.
I'm running gentoo and was using the init script /etc/init.d/apache2 to restart the server. I confirmed it wasn't the init scripts hanging by first running apache2ctl directly, then going straight to the apache2 binary. Same deal with all of them. I went through everything that I could think of that could be causing it. I'd never seen this problem before. I ran into the problem after installing mod_perl, but the issue remained after reverting everything to it's original state. I spent a lot of time believing that had caused it.
I'm going to skip all the crazy googling and wild goose chasing I did. I put my configs into all state of disarray even trying to isolate the problem.
Fast forward about 24 hours - in a strange moment of clarity I realized that I had run an strace on the init and apachectl scripts which didn't really show me much, but hadn't done an strace directly on apache2 (which would obviously give me better info). So I traced a configtest (strace /usr/sbin/apache2 -DPHP5 -DPERL -DDEV -d/usr/lib/apache2 -f/etc/apache2/httpd.conf -t) a couple times, until it hung. And there it was right before my eyes:
It was opening /dev/random and blocking! Shit! I confirmed that this was the problem by cat /proc/sys/kernel/random/entropy_avail and it was something like 33. I was short on entropy. Why it wasn't using urandom I didn't know, but armed with my new knowledge I hit google again and found this article [raptorized.com]. It pretty well explains it at the end...I stuck urandom into my USE flags in make.conf (surprised this isn't a default flag I guess), and re-emerged dev-libs/apr. Bam, problem gone.
Phew, that kinda sucked. The scoop is I'm working on my dev server porting the new site to mod_perl. Yeah right, that will help me actually work on it.
Basically, I was seeing incredibly slow startup times of apache. It could be from 30 seconds up to 5 minutes or more. And it wasn't just starting apache; it was slow stopping apache (-k stop), starting apache (-k start), and even testing the config (-t).
That's weird, but it gets weirder. This problem would only randomly (pun intended; read on) appear. Apache would restart just fine, and then I'd restart it again later and it would hang. Even the problem itself didn't remain the same. Some times it would get through configtest ok and hang at stop and start (or graceful). Sometimes it would get through configtest and stop and then hang on start. It also hung two places on starting. First it would block and hang immediately, then it would background and open a listen socket on 80 but block and hang before it started worker processes. You could connect to the server and send requests, but there you'd sit with no workers to service you. Now randomly mash together some of these places where it could hang and that's what I'd see. And not consistently; sometimes it was just fine.
I'm running gentoo and was using the init script /etc/init.d/apache2 to restart the server. I confirmed it wasn't the init scripts hanging by first running apache2ctl directly, then going straight to the apache2 binary. Same deal with all of them. I went through everything that I could think of that could be causing it. I'd never seen this problem before. I ran into the problem after installing mod_perl, but the issue remained after reverting everything to it's original state. I spent a lot of time believing that had caused it.
I'm going to skip all the crazy googling and wild goose chasing I did. I put my configs into all state of disarray even trying to isolate the problem.
Fast forward about 24 hours - in a strange moment of clarity I realized that I had run an strace on the init and apachectl scripts which didn't really show me much, but hadn't done an strace directly on apache2 (which would obviously give me better info). So I traced a configtest (strace /usr/sbin/apache2 -DPHP5 -DPERL -DDEV -d/usr/lib/apache2 -f/etc/apache2/httpd.conf -t) a couple times, until it hung. And there it was right before my eyes:
stat64("/etc/apache2/vhosts.d/50_apollo_vpn_userlame_net.conf", {st_mode=S_IFREG|0644, st_size=954, ...}) = 0
open("/etc/apache2/vhosts.d/50_apollo_vpn_userlame_net.conf", O_RDONLY) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=954, ...}) = 0
brk(0x816d000) = 0x816d000
read(4, "<VirtualHost *:80>\n\tDocumentRoot"..., 4096) = 954
read(4, "", 4096) = 0
close(4) = 0
read(3, "", 4096) = 0
close(3) = 0
stat64("/usr/sbin/suexec2", {st_mode=S_IFREG|S_ISUID|0710, st_size=10868, ...}) = 0
open("/dev/random", O_RDONLY) = 3
^C
read(3, <unfinished ...>
open("/etc/apache2/vhosts.d/50_apollo_vpn_userlame_net.conf", O_RDONLY) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=954, ...}) = 0
brk(0x816d000) = 0x816d000
read(4, "<VirtualHost *:80>\n\tDocumentRoot"..., 4096) = 954
read(4, "", 4096) = 0
close(4) = 0
read(3, "", 4096) = 0
close(3) = 0
stat64("/usr/sbin/suexec2", {st_mode=S_IFREG|S_ISUID|0710, st_size=10868, ...}) = 0
open("/dev/random", O_RDONLY) = 3
^C
read(3, <unfinished ...>
It was opening /dev/random and blocking! Shit! I confirmed that this was the problem by cat /proc/sys/kernel/random/entropy_avail and it was something like 33. I was short on entropy. Why it wasn't using urandom I didn't know, but armed with my new knowledge I hit google again and found this article [raptorized.com]. It pretty well explains it at the end...I stuck urandom into my USE flags in make.conf (surprised this isn't a default flag I guess), and re-emerged dev-libs/apr. Bam, problem gone.
Phew, that kinda sucked. The scoop is I'm working on my dev server porting the new site to mod_perl. Yeah right, that will help me actually work on it.



