Bad character encoding & AdSense = question marks?


#1

Hello. I’m having some problems with AdSense ads on a few of my pages. i.e. here: http://hazletonarea.net/hazleton_pa/forum/ (NOTE: sometimes the ads display correctly and other times I get question marks in place of actual ad text)

I’ve contacted AdSense and they seem to think the problem has to do with character encoding. Their response to me was:

“According to our parsing logs, it appears that your HTML may be encoded incorrectly. Specifically, it seems that you are using multi-byte character encoding, which is not a valid method of encoding HTML pages.” … “I recommend you look at your webserver settings or talk to your webserver administrator to make sure that your pages are being returned in a standard encoding.”

I’m not super-familiar with character encoding or how to troubleshoot/diagnose this problem.

Can anyone give any tips or help?

Thanks


#2

Your site is being served as ISO-8859-1, whereas your AdSense is being served as UTF-8. Since your site is running Joomla!, I would expect it to also be served as UTF-8 (see this for details). I’m not sure how a web server setting could make any difference. If you get no joy with DH support, I recommend getting support from the Joomla! folks.


Simon Jessey | Keystone Websites
Save $97 on yearly plans with promo code [color=#CC0000]SCJESSEY97[/color]


#3

This is a fun one, especially on Windows. scjessey is correct. The ads I see look to be traditional Chinese. This are “double-byte” characters. If someone doesn’t have the language pack installed, they will get just ???. Even if they do, if the character encoding is off, they may still not get the correct characters.


#4

Ok I’ve changed the _ISO definition in the Joomla language file to UTF-8 and I now have the correct? meta tag being served in my pages:

Is that all there is to it? Do I need to make sure all pages are saved with UTF-8 encoding from a text editor as well?

According to the link scjessey provided, “Up to now in order to change from one encoding to another all that was required was to change the _ISO definition in the language file”. I am using Joomla 1.0.8 not 1.5 as is mostly discussed in that link so I should qualify as “Up to now”.

The ads are still “wrong” however Google also stated, “remember that after any changes you make, it might take 1-2 weeks for our crawler to visit your site and reindex your pages”.

Thanks again.


#5

[quote]

Is that all there is to it? Do I need to make sure all pages are saved with UTF-8 encoding from a text editor as well?[/quote]
There are four things that you can do to control character encoding:

  1. Add this to your .htaccess file:AddDefaultCharset utf-82. Save your files as UTF-8

  2. Add this to the very top of your PHP pages:header("Content-Type: text/html;charset=UTF-8");4. Use the element as you did

Option 1 is a site-wide sledgehammer, so only use that if you are certain that everything is okay being set to UTF-8 by default. I don’t use that (lots of old ISO-8859-1 pages), but I use all the others.


Simon Jessey | Keystone Websites
Save $97 on yearly plans with promo code [color=#CC0000]SCJESSEY97[/color]


#6

Well if the should work then I guess I’ll wait on it a bit and see if Google just needs to spider again to start serving English ads.

Guess I’ll add “read up on character encoding” to my to-do list. Not understanding this has bitten me a few times already.


#7

No. The element is definitely not sufficient on its own. You should use as many of those options as you can.


Simon Jessey | Keystone Websites
Save $97 on yearly plans with promo code [color=#CC0000]SCJESSEY97[/color]


#8

Ok will do.