Odd mod_rewrite behaviour


#1

Hello,

I’ve just signed up here and so far everything is very easy and swift and have managed to transfer most things over from my old host without a problem. I am however seeing some odd mod_rewrite behaviour. This is what I use:

RewriteCond %{REQUEST_URI} !\.(png|gif|jpg|css|js|html|htm|txt|php|zip)$ RewriteBase stuff/ RewriteRule .* index.php?what=$1 [L]
With my previous host (and with XAMPP), “what” was simply something like:

dir/subdir/pagewhere the full path for that would be:

http://mydomain.com/stuff/dir/subdir/page“what” is then parsed by PHP, using $_GET[‘what’]. Unfortunately, with dreamhost, “what” is being returned as this:

http://mydomain.com/home/raf/mydomain.com/stuff/dir/subdir/page//subdir/pageIt’s very weird, especially the repetition at the end. I’ve tried all sorts of things and other more restrictive regular expressions, but I can’t seem to change what’s going on without breaking it entirely.

Any insight is greatly appreciated!


#2

Are you absolutely sure that in the process of getting everything moved you didn’t inadvertantly end up with .htaccess files in multiple directories that are conflicting with each other?

–rlparker


#3

I’ve managed to fix it somehow (removing the RewriteBase line mostly), it now works except that something else rather odd is happening.

If I type http://mydomain.com/stuff/misc/ into my browser it works fine. If I type in the same thing without the last /, the address bar displays http://mydomain.com/stuff/misc/?what=misc
"misc" is a valid name for something and the page will display the content related to “misc”. What I find bewildering is that if instead of “misc” I type in any old junk, e.g. “asdf” without the ending /, the query string doesn’t appear. How on earth does the server/browser know which point to real content and which don’t? My script doesn’t even send any headers except for content-type and character encoding.

If you have any suggestions for how I might get rid of the query string that would be great. Cheers!


#4

Go here to find out more about the trailing slash issue. You’re not alone in experiencing it.

Don’t forget about Google either. It’s the most useful tool out there second only to experience.

[quote]Trailing Slash Problem

Description:

Every webmaster can sing a song about the problem of the trailing slash on URLs referencing directories. If they are missing, the server dumps an error, because if you say /~quux/foo instead of /~quux/foo/ then the server searches for a file named foo. And because this file is a directory it complains. Actually it tries to fix it itself in most of the cases, but sometimes this mechanism need to be emulated by you. For instance after you have done a lot of complicated URL rewritings to CGI scripts etc.

Solution:

The solution to this subtle problem is to let the server add the trailing slash automatically. To do this correctly we have to use an external redirect, so the browser correctly requests subsequent images etc. If we only did a internal rewrite, this would only work for the directory page, but would go wrong when any images are included into this page with relative URLs, because the browser would request an in-lined object. For instance, a request for image.gif in /~quux/foo/index.html would become /~quux/image.gif without the external redirect![/quote]


#5

Thanks for replying. Actually, the problem is caused precisely because I was trying to fix the trailing slash issue. The query string gets appended when the trailing slash was added using this rule:

RewriteCond %{REQUEST_URI} !\.(png|gif|jpg|css|js|html|htm|txt|php|zip)$ RewriteRule (.+)/([^/]+)$ $1/$2/ [R] RewriteCond %{REQUEST_URI} !\.(png|gif|jpg|css|js|html|htm|txt|php|zip)$ RewriteRule (.+)$ index.php?what=$1 [L]I’ve solved the problem by using this instead:

RewriteEngine on RewriteCond %{REQUEST_URI} !\.(png|gif|jpg|css|js|html|htm|txt|php|zip)$ RewriteRule (.+)/([^/]+)$ $1/$2/ [R] RewriteCond %{REQUEST_URI} !\.(png|gif|jpg|css|js|html|htm|txt|php|zip)$ RewriteRule .+ index.php [L]Then index.php just parses $_SERVER[‘REQUEST_URI’]
The problem in this case is that http://mydomain.com/1 will get the slash added to it but http://mydomain.com/1/2 will not. I suspect it’s something to do with greediness and I’ve tried making the second bit lazy, hoping it would encourage extra-greediness in the first parentheses, but it doesn’t work:

RewriteRule (.+)/([^/]+?)$ $1/$2/ [R]This simpler thing doesn’t either:

RewriteRule (.+[^/])$ $1/ [R]I’m not too hot on regex. Please would you recommend what to do?