404 header problems (not "how to" question)

software development

#1

I’ve had no problems setting up custom 404 or anything like that. I’m running some perl with mod_rewrite and all of the 404 pages are working correctly. The problem is that the server isn’t actually showing a 404 code in the log. I’m sending the correct headers:(Status: 404 Not Found) and the correct error page is showing but the server logs show a 200 code. It’s as if it’s saying, “Hey look it’s OK because I loaded your error page.” I would really like it to correctly display the 404 error code in the error logs so I know it’s being sent to the requestor (especially search engines, since they like to see a 404 code). Does Dreamhost do any finangling with codes before they’re sent?


#2

No, the code your script outputs would be the code the client sees - see http://web-sniffer.net/

I gather from what you’ve said you have something like this:

[code]# Not found error document
ErrorDocument 404 /error/notfound.pl

Rewrite URI to the same document - WONT WORK

RewriteEngine On
RewriteBase /
RewriteRule .+ /error/notfound.pl[/code]1. The internal redirect from /xyzzy to /error/notfound.pl is successful, so Apache thinks the status code should be 200 (REDIRECT_STATUS will be 200, and REDIRECT_URL will be /error/notfound.pl)
2. mod_rewrite does not allow you to overide the status code except for external redirects (301, 302, etc) and not any other range.
3. You’ll need to also use the Redirect directive instead:

[code]# Not found error document
ErrorDocument 404 /error/notfound.pl

Force temporary URI to be not found

Redirect 404 /xyzzy

Rewrite actual URI to temporary URI

RewriteEngine On
RewriteBase /
RewriteRule .+ xyzzy
[/code]
Even then, you’re still going to get 200 in the logs; the above only appears to make it so REDIRECT_STATUS is 404 instead 200.

:cool: Perl / MySQL / HTML CSS


#3

I didn’t include enough information in the original post. I have a bunch of static pages in my site that I don’t want to use different HTML files for. So I have a static.pl file that takes these pages and chooses the correct template to render them. My rewrites look like this:

RewriteRule ^(.*)$ /static.pl?page=$1&%{QUERY_STRING} [L]

So it just takes http://domain.com/about and turns it into http://domain.com/static.pl?page=about.

Then in static, at the end I have something like this:

my $fullPath = “$templatePath/$page”;
if(stat($fullPath))
{
print $cgi->header(@_);
drawPage($page);
}
else
{
print $cgi->header(-status => ‘404’);
drawPage(‘404’);
}

So if I can’t stat the template(eg it doesn’t exist) I return a 404 status with a custom 404 page.

From what you’re telling me it looks like I can’t give the server another code after mod_rewrite has already rewritten my page. So I tried:

print $cgi->redirect(-uri => ‘http://domain.com/404’, -status => ‘301’);

and then putting

ErrorDocument 404 /static.pl?page=404
Redirect 404 /404

in the .htaccess before I turn on the rewrite engine but that doesn’t seem to be working either. I’m still getting a 200 status. I’m going to play around with it some more but has anyone tried this before?

Oh, I just saw you said I’ll still get a 200 status in the log. What a pain. What will the calling client see?

Thanks,
Bob


#4

As I already said, the client will see the status code your script reports, and you can test this at http://web-sniffer.net/ It’s just that by the time your script executes, the web server itself has settled on a status code for the log. It might be an Apache bug, but then URL Rewriting in 1.3 is a bit kludgy to begin with.

And it should be a bit obvious by now but mod_rewrite was not exactly meant to do what you’re trying to do, hence the disability in handling codes other than external redirects. Besides, since all your pages are being handled by a script, you can simply write code in the script to maintain a log of its own, if you’re worried about keeping track of certain situations only the script knows about, such as your pseudo-404’s.

:cool: Perl / MySQL / HTML+CSS


#5

Excellent, thanks for the link. It looks like I’m getting the right code now.