Mod_rewrite and slashes


#1

Hello,

I’m having trouble trying to avoid urls at my web sites with slashes.

Right now I have this on my .htaccess:

RewriteRule .+php.+/.+ http://www.domain.com [R=404]

So I expect every url like post.php?id=234234/post.php?id=16 to be redirected to my home page, but it doesn’t work.

Is there any way to enable logging for mod_rewrite, or maybe a definite solution to this problem?


#2

That won’t work. Everything after the first ? but before the # becomes part of the query string. The query string is not matched in the RewriteRule, it is usually matched using a RewriteCond directive.


#3

You want to redirect any request with a forward slash to your home page? What are you trying to accomplish? Are you trying to prevent penetration probes or?

What you wrote above won’t work. I think .htaccess only accepts basic regular expressions, which means you can’t use +. You might try:

I can’t remember if you need to escape the slash in this case.

There is a way to log .htaccess errors, but the last time I tried it on DH, it seemed that it was disabled (or I was doing it wrong). Details here


#4

[quote]RewriteRule .+php.+/.+ http://www.domain.com [R=404]

So I expect every url like post.php?id=234234/post.php?id=16 to be redirected to my home page, but it doesn’t work.[/quote]
404 is not a redirect, it’s a “no such page”. If it worked, it would serve up a 404-- which is fine if it’s what you want-- and would take human users to your custom 404 page or to “missing.html”, depending on your existing settings.

A redirect is 301 or 302, where [R] defaults to 302. You want 301.

The rule doesn’t work because RewriteRule by itself looks only at the body of the request, omitting the query string at one end and the protocol and host at the other. (The # if any never reaches the server at all, except in the special case of #! “hashbangs”.)

To make the rule work, you need two lines:

RewriteCond %{QUERY_STRING} .
RewriteRule .php$ http://www.example.com/? [R=301,L]

This means simply: If there is a query string at all, then get rid of it and redirect to your home page. And then stop. You need the [L] flag, or else mod_rewrite will continue looking for more rules.

For more than you ever wanted to know:

http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html


#5

301 implies the page actually exists and should supply the new location for the requested page.

If the page does not exist, then 404 is the proper code to use.


#6

Oops, misread the original question. You don’t want to throw out all URLs with query strings, just the ones where the query string contains a slash?

RewriteCond %{QUERY_STRING} /

Slashes in mod_rewrite do not need to be escaped. No anchors necessary; all you need to say is “There is a / somewhere in the query”.

But .htaccess may not even be the best place to do it. If you have URLs with query strings then presumably you already have a php function that deals with them. Send all your requests to that function, and let it issue the 404 for the malformed or nonexistent ones. In fact it should be doing so already-- not just for slashes but for anything else that doesn’t resolve to a page.

You should also spend some time finding out where those malformed queries are coming from-- a typo in a link, a glitch in input processing, could be something very simple-- and, if possible, stop them at the source. Same as you’d do with any recurring bad URL.

301 doesn’t imply anything one way or the other about the requested URL’s existence or non-existence. It’s the HTTP equivalent of “use the other door”.


#7

Actually, OP is ambiguous on this point although most responses are assuming OP is referring to the query string only due to the example. The original post refers to removing slashes from the URL, not the query string. It may indeed be that OP wants to only keep slashes out of query strings, but that hasn’t been confirmed yet which is why I asked for clarification before proposing my solution to deal with the path since a the definition of a URL requires at least two slashes as part of the protocol segment.


#8

301 is a statement that the requested resource has “Moved Permanently”.

Protocols: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html


#9

I know what the code means, fer hevvins sakes. I’m talking about its real-life application.

Simple example: Most people have a preferred domain-name format, either example.com or www.example.com. (### Forums! I don’t want that to auto-link! That’s why I said example.com in the first place.) When you redirect from one to the other, it doesn’t mean “I used to be at example dot com but now I’ve moved to www dot example dot com.” Similarly, when mod_dir issues a directory-slash redirect: it’s obviously got nothing to do with the location of the directory-- in fact it only works if the directory is in the requested location-- only with naming format.

To me the OP says “slashes after the .php extension” which can only mean slashes within the query string. The RewriteRule as written is wrong in several ways, but you can see what the “pattern” side was intended to do.

Sayine, still with us? When you come back, I hope you’ll also clarify what you want to happen in the end. You can either redirect to your home page (probably not a good idea if you care about search engines) or you can issue a 404, but you can’t do both. Well, technically you can if you use your home page as your custom 404 page, but this is a Very Not Good idea ;).


#10

Yeah, but that’s not what OP said which is why the request needs to be clarified. I would bet that OP doesn’t want rewriting at all but rather help in the template that is creating the malformed URLs in the first place.


#11

That’s precisely what a 301 means.