Server-Side Includes With Content Negotiation


#1

I’m having a pretty strange problem with server-side includes in conjunction with content negotiation. Specifically, a whole lotta garbage characters appear whenever I try to include a file affected by content-negotiation.

It’s worth mentioning that this issue only seems to happen on Dreamhost. If I try using my files with a locally installed Apache Server or my office server, everything functions exactly as I’d expect.

Here’s the simplest way to see my problem.

I have three file: .htaccess, index.shtml.en, and menu.shtml.en

The .htaccess file simply contains information about language-based content negotiation:

AddLanguage en .en Options +MultiViews

The index.shtml.en file simply contains some text, then includes the menu:

[code]English Page

[/code]

Note that I’m including menu.shtml and not menu.shtml.en; this is intentional, and should work. (And, indeed does work on my own Apache server—just not Dreamhost.)

The menu.shtml.en simply contains some text:

Now, the problem: when I access the index page on my Dreamhost server, the menu text appears followed by a string of garbage characters. The actual index page text doesn’t appear at all. You can see it for yourself here.

Has anyone experienced this before? What should I do?


#2

First of all, those aren’t valid HTML documents. Secondly, for some reason your page is being rendered using the Windows-1255 hebrew character set on my browser. I’ve tried all the different Western and Eastern European character encodings, but none of them work either, nor does UTF-8.

I don’t use content negotiation (though I’ve run into problems with type maps in Apache before), so I don’t know how the character encoding is determined. But viewing the traffic from the web server, I don’t even see a character set specified in the HTTP Content-type header. So I don’t know where Firefox got Windows-1255 from.

By fetching http://mobi.ornj.net/index.shtml instead of http://mobi.ornj.net/, the content does seem to be transmitted and displayed correctly in Western ISO-8859-1. However, fetching http://mobi.ornj.net/index.shtml.en causes the second half of the page to again be unreadable.

I would try checking the character encoding of your index.shtml.en file using your HTML editor. Also, create another copy of it as index2.shtml and see if it renders properly when accessed without content-negotiation. If it does, you can also see what character encoding is being used to display it. You can also try removing the SSI directive to eliminate it as a possible cause (seems unlikely, but you may want to try anyway). Last, try creating a separate and valid HTML document and save it in a known character encoding (e.g. UTF-8 or ISO-8859-1), and specify the encoding in the document head.

And are these the only files you have on your server? Also, is that SSI code supposed to put the contents of menu.shtml in front of the calling page contents? Because that’s what is being displayed.


#3

You’re right that these aren’t HTML documents; they’re just a very basic demonstration of the problem with as few files and as little text as possible.

The files I mentioned above are, indeed, the only files on the server. As you noticed, the menu text is appearing in front of everything else, which is not at all what should be happening.

The odd character encoding being reported by the browser is its way of interpreting the garbage characters that are being output. When I try this experiment with valid HTML and declared encoding, with plenty of double-checking to make sure, the garbage still appears. And, again, since it works fine on localhost and my office server, I don’t suspect it’s a character-encoding issue.

I’m fascinated that http://mobi.ornj.net/index.shtml work fine, while /index.shtml.en and / do not. I can’t imagine what could cause this inconsistency, considering that all three URIs resolve to exactly the same file. This also seems to indicate that something other than character encoding is going awry, since the file is encoded the same no matter which URI is used to access it.

What could be going on here?[hr]
Well, I think I figured out what’s happening, but I still don’t know how to fix it.

The garbage characters are actually binary data. Specifically, GZIP data. It seems that the server is configured to GZIP the content before it’s transmitted, but it’s not waiting for the page to be fully retrieved and assembled first. This is a very strange configuration. Is there anything I can do about this?


#4

I spoke with Dreamhost support; turning off mod_deflate solved the issue. :slight_smile: