Apache intercepts only encoded URL components, rewrites to missing.html


#1

I am writing a Node.js-based server and web application on a subdomain of my VPS. I have a proxy set up to forward HTTP requests to the Node.js process, which is listening on an assigned port. Normally, this is a dream host setup (ha ha) and is completely transparent, including custom 404 handling without interference from the main server process (Apache), which has no .htaccess for this subdomain.

When writing an AJAX response today, though, I noticed that, when I include a path encoded with encodeURIComponent(), something (I am assuming it is Apache) is rewriting the URL of the request. For example, the following URL is parsed normally:

URL in browser: "/as2Fdf.js"
URL in Node.js request object: "/as2Fdf.js"

However, the following URL is rewritten:

URL in browser: "/as%2Fdf.js"
URL in Node.js request object: "/missing.html"

Interestingly, other encoded URL characters are ignored:

URL in browser: "/as%3Fdf.js"
URL in Node.js request object: "/as%3Fdf.js"

Other encoded URL characters are decoded–before they ever hit the Node.js process:

URL in browser: "/as%3Adf.js"
URL in Node.js request object: "/as:df.js"

Ideally, Node.js would be passed the exact same string, as it appears in the address bar, without decoding or rewriting. I’m guessing Apache is tampering with the request, but without an .htaccess I’m not sure where it’s getting its instructions.

I’m guessing Apache is decoding “%2F” and interpreting it as a path component of the address, which defeats the point of using encodeURIComponent(). This would explain why Apache is ignoring the proxy (which is specific to the root URL of the subdomain)–except the following URL is (correctly) NOT evaluated to a subfolder and forwarded to Node.js unmodified!

URL in browser: "/ajax/myParam"
URL in Node.js: "/ajax/myParam"

Very confusing. What’s going on here?