Greetings. I have a weblog through Blogger that I previously used with another provider. With Dreamhost, Blogger is adding ^Ms at the end of each line in my weblog when it’s uploaded. Has anyone else run into this problem and, if so, do you have a solution? Thanks!
Use dos2unix on the files, or else open them in pico / nano, and save them. This will get rid of the ^Ms.
True, but I’d have to do that every time I post to my weblog…
Oh - I thought the problem was only with existing files. In that case, the solution is to use a different editor to write the files in the first place.
With Blogger you don’t use an editor. You write in a textarea, then the information is added to a web page which is then automatically uploaded via FTP.
Hrm - I’m a bit confused about why it needs to be uploaded via FTP if you’re editing it on the web site itself. Perhaps someone else has an answer for you because I don’t really know.
Will – One of the methods of using Blogger allows you to write a weblog via a browser interface on their server, then automatically transfer static HTML files built based on your data and chosen template to the web server of your choice. Designed to allow people who don’t have scripting capability on their server to have a simple blogging CMS that can be updated from the web, but actually works pretty well if you don’t want to mess with installing server-side software yourself or prefer static HTML to dynamically generated pages.
Junyor – I was recently testing Blogger’s auto FTP upload capability on a site of mine, and I’m not seeing the problems you are. I thought a ^M is a macintosh linefeed, but I’d think those would be stripped out by their server (and shouldn’t display differently in a web browser, anyway–don’t they interpret all linebreaks the same?). Out of curiosity, are you on a Mac?
If so, what browser? I’m using OSX, and again, haven’t seen any issues. Perhaps try using a different browser to update your blog, and see if that helps. Maybe somehow your template ended up with Mac linebreaks or something.
@Makosuke: AFAIK, ^Ms are Windows linefeeds. I’m using Windows XP and Opera 7.11. Since the template is also updated via a TEXTAREA, I don’t see how the linefeeds could be a problem that way either. I’m perplexed.
^M (character #13 in the ASCII set) is a carriage return. ^J (character #10) is a linefeed. In Windows, PC-DOS, various ancient mainframes, and traditional Internet standards, a line break is represented by CR followed by LF (#13#10, ^M^J). In Unix, Linux, and similar operating systems, a line break is represented by a LF alone (usually called a “newline” by Unix geeks). In MacOS, as well as older Apple systems such as the Apple II series, and also on the Commodore 64 and some other obsolete home computers, line breaks are represented by CR alone. Loads of fun results when a file created in a system with one convention is used on a system with a different one.
Transferring by FTP in ASCII mode rather than binary mode should cause the line breaks to be converted properly to the destination system’s convention, assuming of course that the FTP program knows correctly what operating systems are in use at either end, and the file in question is in fact stored correctly for the conventions of the originating system. If the file has previously been transferred in the wrong mode, or it’s been sent in a non-FTP manner (e.g., over a LAN where remote systems of different OS are treated as if they’re local drives), then it can get all screwed up anyway.
The whole problem is the fault of whoever devised ASCII around 1963 or so (the year I was born); the control codes were designed to control printing Teletype terminals, and they decided that shoving the carriage to the left edge of the device (CR) should be a separate operation from moving it down a line (LF). Back in my college days, I remember occasional fads of doing neat animated effects (e.g., in plan files) by having CRs without LFs. Backspaces sometimes were also used there.
Thank you for the lesson, although I now feel like an ignorant simpleton for mucking up my terminology. I was figuratively correct in that ^M alone would indicate a Macintosh linebreak, which I have seen a couple of times when I accidentally uploaded a file with Mac linebreaks to a DH server in binary.
But of course none of this explains why Blogger would be having problems with that–I’d assume they’re using ASCII transfers, and I doubt their software is generating things with only CRs.
Could you give a URL for a sample of one of the affected pages?
Sure, try http://weblog.timaltman.com/archives/2002_12_01_index.html. You’ll notice that the page doesn’t validate. I don’t know that you can see the characters in the source, though. Before when I was using some PHP, it would complain about parse errors.
Hmm… I’m not seeing any ^M characters showing up in the served HTML, but then I wouldn’t expect to, and they should still be treated as valid whitespace by any browser, so it doesn’t matter. It would cause a problem with PHP scripts, though.
For reference, as far as I can tell, the only reason your document isn’t validating as XHTML 1.0 is that you’re lacking the trailing ? in the initial statement. If you change it to:<?xml version="1.0" encoding="iso-8859-1"?>
…it seems to validate just fine. I’m far from an expert, but I don’t see why ^Ms would affect the validation anyway, since they’re accepted as whitespace in the XHTML 1.0 spec.
Shows up normal for me.
Well, what do you know. I must have removed that when I was playing with PHP. I am using PHP in this page and it isn’t complaining at the moment, so, even though the ^Ms are still showing in the source on the server viewed via emacs, everything looks fine when you look at the page, which is really what counts. Thanks for the help!
Sure, try http://weblog.timaltman.com/archives/2002_12_01_index.html. You’ll notice that the page doesn’t validate. I don’t know that you can see the characters in the source, though. Before when I was using some PHP, it would complain about parse errors.[/quote]
Here’s a handy tool -
Using that, it seems the header and footers have CRLF pairs but the body has only LF. It is probably not an FTP problem. Edited Maybe it is an FTP problem if they are uploading it with BINARY mode instead of ASCII.
How does one change the headers and footer templates in Blogger? Is it by an HTML form? If so, perhaps they are not stripping out the CR when the browser submits the form, or perhaps they intentionally store the templates with CRLF pairs.
Yeah, you work on your template in an HTML form and also make posts to your weblog via an HTML form.