Checking static website integrity


I wrote a script which checks the integrity of my static websites by checksumming each file and comparing with the checksums of the corresponding files on my home machine. The script is driven by a text file which contains a list of my static websites, together with the appropriate user names. For example, user1 user2 user3

For each line of this file, the script logs in as the specified user and runs

where ‘mydomain’ is the specified domain name.

I have two questions:

(1) Obviously the script could be made simpler if it could set an environment variable (’$domain’, for example) instead of having to edit itself at run-time to contain the specified domain name. The message which comes up when it tries to set an environoment variable is “Server refused to set environment variables”. Does Dreamhost disallow setting environment variables?

(2) I have large video files in some of my websites, so maybe I should worry about how many CPU minutes this script is eating up. I suppose I could simply run the script whenever I feel like it and wait to see if anyone complains, but that does not sound sensible. What would be a better approach?



export DOMAIN="domain.tld"
echo $DOMAIN

or you might have the script run with the domain as an arg
sh myscript
find $1/ -type f …

You could use ! -name to skip video files
find $1/ -type f ! -name ‘.mpg’ ! -name '.flv’ ! -name ‘*.avi’ …

or set a size limit

files < 20MB

find $1/ -type f -size -20M …


One approach you might want to consider might be to check the whole site into a Git repository and use “git status” to check if anything has changed.


Thank you sXi … I’ve realized my problem with environment variables may be because of this: (from PuTTY guide)

[quote]4.14.5 Setting environment variables on the server

The Telnet protocol provides a means for the client to pass environment variables to the server. Many Telnet servers have stopped supporting this feature due to security flaws, but PuTTY still supports it for the benefit of any servers which have found other ways around the security problems than just disabling the whole mechanism.

Version 2 of the SSH protocol also provides a similar mechanism, which is easier to implement without security flaws. Newer SSH-2 servers are more likely to support it than older ones.

This configuration data is not used in the SSH-1, rlogin or raw protocols.

To add an environment variable to the list transmitted down the connection, you enter the variable name in the ‘Variable’ box, enter its value in the ‘Value’ box, and press the ‘Add’ button. To remove one from the list, select it in the list box and press ‘Remove’.[/quote]

So, just to check … is it the case that Dreamhost servers don’t allow environment variables to be set in this way? Any chance that they might? It would be really useful, as it would allow PuTTY sessions to be saved with any requisite environment variables pre-specified.

Regarding the CPU minutes: the ideal thing would be a weekly or monthly emailed report saying how much CPU time each of my users has used. I imagine this is a common enough thing that hopefully someone can offer a pre-packaged way of doing it … ?



Our SSHD configuration has the following line in it, which is inherited from Debian’s standard sshd_config:

If you want to set environment variables, you can do so in your .bash_profile or similar.


Yes but I think that’s not what I’m looking for. For example (and this is a digression from the subject of this thread, but just an example to show what I mean), how could I arrange for an environment variable to have the value ‘work’ when I am logged in from work, and ‘home’ when I am logged in from home … and for it to have no value at all when I am logged in from a friend’s machine which is neither at work nor at home?

The natural way to do this (if it were possible) would be for the plink/putty saved session on my work machine to set the environment variable to ‘work’, whereas the plink/putty saved session on my home machine sets it to ‘home’.

Of course, there are work-arounds. For example, when I log in from home, I could run a script which says ‘I am at home’. But (a) I might forget to do that and (b) if this kind of thing can be done automatically by the infrastructure, it is obviously better.

Or, I could use the ‘-m’ parameter in plink/putty to do this. But that would ‘use up’ this parameter, which could otherwise be put to better use.

I realize this is all rather unimportant compared with everything else which is going on here at the moment, but if anyone would like to take a break from weightier affairs and bend their mind to lightweight puzzles, any suggestions would be welcome.



I have set up a cronjob that creates a log file of daily CPU time used. It would be trivial to add a weekly or monthly email as you require. I have a daily email that sends that last 10 lines of the log which works for me.

#! /bin/bash
# check resource logs
var_date=$(date +"%Y/%b/%d" -d yesterday)
total=$(cat $resource_log | grep "^Total" | awk '{print $2"\t"$4"\t"$5}')
echo -e "$var_date\t$total" >> $output

Also, I second andrewf’s suggestion of setting up a git repository. Even if you don’t use git, it’s quite easy to set up and does exactly what you are trying to manually do. And it’s so fast that you’ll be hard pressed to measure any increase in CPU minutes even if you checked hourly. Perhaps even more often.


Great solution.


Thank you … interesting … I will look into it. I googled ‘git for dummies’ and numerous hits came up (and luckily, my dreamhost server has an already-installed git), so I am in with a chance.

Still, my (very partial) understanding, so far, is that git does not try to be cryptographically secure, as it is intended to guard against inadvertent garbling, but not against malicious counterfeiting. And that for that reason, it still uses SHA-1 (which may be cryptographically broken).

Shouldn’t an attempt to “check static website integrity” be using SHA-2?



At first I thought that meant that it would only accept ‘LANG’ plus any already-existing-in-the-system environment variables which begin with the characters ‘LC_’

(and I didn’t want to co-opt any of those variables for my own use, for fear of breaking something),

but I’ve now found by experiment that it allows me to invent any environment variable name I like, provided it begins with the characters ‘LC_’.

Thus, (going back to my fictitious but useful example in post #6) if in the plink/putty login configuration I try to set the environment variable ‘WHEREAMI=work’, the system refuses to accept it (it responds, “Server refused to set environment variables”)

but if instead I specify ‘LC_WHEREAMI=work’ then it … works!

That’s great, and sorry I didn’t understand your reply the first time around.