Creating a Backup Script (bash)

software development

#1

I have been working on a backup script I can run as a cron-job. This is specifically for using in conjunction with the backup account DH gives us.
So far it seems to work, I would just like to have some feedback to find way to improve this or make it more flexible.
Also, if anyone else would like to use this as a backup tool, they are welcome.

Firstly: this assumes the user is familiar with sftp and ssh, and can use both with their hosting account and sftp into their backup account.

####The Goal:
to have a scheduled backup of my Databases and Website files for each month. This script runs once a month, and saves the backups in a folder for whatever month it is. The following year the contents of that months folder will be over-written with any changes. So 12 folders each with a complete backup that are overwritten once a year.

Any mistakes or hacks on my website, and I can go back in time up to a year.

#####How I did it:
From a terminal, I ssh into my hosting account and create a key: ssh-keygen -t dsa

then sftp into host and copy the contents of /.ssh/.id_dsa.pub

sftp into backup server and create a file called /.ssh/known_hosts
paste the contents onto that new file (that I had copied from .id_dsa.pub)
save and close that file
in the backup account, I create some folders:
/website_backups
/website_backups/yearly
/website_backups/yearly/Jan
/website_backups/yearly/Feb
…and so forth, one for every day of the month.

sftp back into hosting account and:
create a folder called: /.dbbu/yearly/
create a file called: /.scripts/backup_yearly
open the backup_yearly file and add these contents, and edit to match your setup:

#!/bin/bash

MONTH=$(date +%b)

#remove db files older than 28 days
find ~/.dbbu/yearly/*.sql -type f -daystart -mtime +28 -exec rm {} \;    

#get recent version of databases (one line for each database)

mysqldump --opt --user=YOURDATABASEUSER --password=YOURPASSWORD --host=mysql.YOURDOMAIN.COM YOURDATABASE --lock-tables=false   > ~/.dbbu/daily/NAMEOFYOURDATABASE.`date +\%Y-\%m-\%d_\%H-\%M-\%S`.sql  

mysqldump --opt --user=YOURDATABASEUSER --password=YOURPASSWORD --host=mysql.YOURDOMAIN.COM YOUROTHERDATABASE --lock-tables=false   > ~/.dbbu/daily/NAMEOFYOUROTHERDATABASE.`date +\%Y-\%m-\%d_\%H-\%M-\%S`.sql  

#backup website files (one line for each top-level folder)
rsync -e ssh -av --delete --protocol=29 .dbbu/yearly/ YOURBACKUPUSERNUMBER@hanjin.dreamhost.com:website_backups/yearly/$MONTH/db     
rsync -e ssh -av --delete  --protocol=29 YOURWEBSITEFOLDER/ YOURBACKUPUSERNUMBER@hanjin.dreamhost.com:website_backups/yearly/$MONTH/YOURWEBSITEFOLDER/      

save and close.
make that file executable (in linux you can just right-click>permissions>Allow executing file as program)
make that file read-only for owner and None for everyone else.

ssh into hosting account from a terminal and run that script. It asks it I am sure I want to connect, I say Y and it goes through a whole routine. If configured properly I can look into my backup account and find that all my website and database files are nicely backed up into whatever month it is.

Now I add this script to a cron-job using the Dreamhost control panel>Goodies>Cron
Create a new job. User is whatever use I log into the hosting account with
command is: ~/.scripts/backup_yearly
Run once a month

Done.

######Thoughts
there is probably a way to write this script so it creates all of the proper folders first if they do not exist.
there is probably a way to write this so that if it is ran weekly, it will create weekly folders, and daily…days of the week…so on.


#2

There are a lot of resources for bash scripting such as references and examples etc. So if you are experience with programming it is just a matter of translating between Perl/Python/PHP etc to bash. So you may want to indicate if you are not to programming. Obviously it would be more flexible with the additona of variables and logic (if/then/else, loops, etc).


#3

Hi. Thanks. I am amateurish when it comes to writing scripts. I understand the concepts of variables and loops. I also understand looking up references and examples, which is how I was able to create this script.

I really put it up here as general discussion and to see if anyone else had some thoughts. I have read many posts in this forum by people wanting a way to backup on a schedule. This seemed like a solution, and perhaps could be made slightly more simple.


#4

Okay, after some dinkering around I have come up with a new version of my script:

[list]
[]This one does not require you to create folders, as it will check to see of they exist and then create them if not.
[
]You name the script for how frequent you want to run it, and it will create a specific folder for that. For example: if you are going to run this hourly via cron, just name the script hourly. It checks its own name, and creates the backups on the backup account under /website_backups/hourly
[*]Also, there is now a place at the top of the script to add your information, so you do not need to go through the script and edit every line to match your server info. Just edit the lines at top to match your server info
[/list]

You still need to first set up an ssh key on your webhost for your backup account for this to work.
Also, make sure the script is set to executable

Then, run this script via command line to make sure it is working, like this (assuming the script is named hourly and is in a folder called .scripts:
[nair-al-saif]$ ~/.scripts/hourly

After that you can add it as a cron-job in the DH control panel.

I am open to input on how I can revise this.

ALso, use at your own risk. read it first before you run it. The code is well-commented. It does not have any damaging commands in it, but earlier when I was playing with it I had a small error and it started making copies of copies…had to exit out really quickly.
It takes a few minutes to run, but it should not take more than a few minutes unless you have a huge website and database with GB of files.

#!/bin/bash

#this script should be named for the frequency you want to run it:
#for example:
#hourly
#daily
#weekly
#monthly
#yearly
#the script will create a folder based on the name of the script
#this way you can just copy and rename the script for each
#time/frequency you want to run it



#Options -start editing here##############
#your database info####
   DBHOST='mysql.yourdomain.com'
   DBUSER='database_user'
   DBPW='dbpassword123'
#your databases (if more than one separate with spaces
   DBNAME=( 'wp_db'  'joomla_db' )

#your remote user info ##
   REMUSER='b1234567'
   REMHOST='hanjin.dreamhost.com'

#your folders to backup ## 
    FOLDERS=( 'yourdomain.com'  'yourotherwebsite.com' )

#End Options #stop editing##############




#get the name of the script
SCRIPTNAME=`basename $0`
#get the current month
MONTH=$(date +%b)

  if  [ "$SCRIPTNAME" != "yearly" ] ;  
  then

       #check if local .website_backups directory exists. if not create it
       if [ ! -d "~/.website_backups/$SCRIPTNAME/db" ]; then
       mkdir -p ~/.website_backups/$SCRIPTNAME/db 
       fi
       
       #remove db files older than 1 day
       find ~/.website_backups/$SCRIPTNAME/db/*.sql -type f -daystart -mtime 1 -exec rm {} \;   

       #get recent version of databases from array
       for i in "${DBNAME[@]}"
       do
       mysqldump --opt --user=$DBUSER --password=$DBPW --host=$DBHOST $i --lock-tables=false   > ~/.website_backups/$SCRIPTNAME/db/$i.`date +\%Y-\%m-\%d_\%H-\%M-\%S`.sql
       done

       
       for i in "${FOLDERS[@]}"
       do
	rsync -av -t --delete  ~/$i  ~/.website_backups/$SCRIPTNAME/   
       done


       #there should be a .website_backup folder on your local webhost
       #now backup this to your DH backup account
         rsync -e ssh -av --delete --protocol=29 .website_backups/$SCRIPTNAME/ $REMUSER@$REMHOST:website_backups/$SCRIPTNAME/   

   else

    #check if local .website_backups directory exists. if not create it
       if [ ! -d "~/.website_backups/$SCRIPTNAME/$MONTH/db" ]; then
       mkdir -p ~/.website_backups/$SCRIPTNAME/$MONTH/db 
       fi

       #removes db files older than 1 day (if run daily will have keep 24 copies of database in daily folder)
       find ~/.website_backups/$SCRIPTNAME/$MONTH/db/*.sql -type f -daystart -mtime 1 -exec rm {} \;   


       #get recent version of databases from array
       for i in "${DBNAME[@]}"
       do
       mysqldump --opt --user=$DBUSER --password=$DBPW --host=$DBHOST $i --lock-tables=false   > ~/.website_backups/$SCRIPTNAME/$MONTH/db/$i.`date +\%Y-\%m-\%d_\%H-\%M-\%S`.sql
       done


      for i in "${FOLDERS[@]}"
       do
	rsync -av -t --delete  ~/$i  ~/.website_backups/$SCRIPTNAME/$MONTH/   
       done

       #there should be a .website_backup folder on your local webhost
       #now backup this to your DH backup account
         rsync -e ssh -av --delete --protocol=29 .website_backups/$SCRIPTNAME/ $REMUSER@$REMHOST:website_backups/$SCRIPTNAME/   


 fi

#5

I had to remove verbose(-v) and add the quiet (-q) option to rsync because the cron-job was sending me an email every time it ran.
Now there is no output, so if you are testing you might want to switch -q to -v to check for errors. Otherwise here is the update:


#!/bin/bash

#ver. 2 

#this script should be named for the frequency you want to run it:
#for example:
#hourly
#daily
#weekly
#monthly
#yearly
#the script will create a folder based on the name of the script
#this way you can just copy and rename the script for each
#time/frequency you want to run it



#Options##############

#your database info####
   DBHOST='mysql.example.com'
   DBUSER='DB_USERNAME_HERE'
   DBPW='DB_PASSWORD_HERE'
#your databases
   DBNAME=( 'mydatabase' )

#your remote user info##
   REMUSER='b12345678'
   REMHOST='hanjin.dreamhost.com'

#your folders to backup 
    FOLDERS=( 'bmc'  '.archives' 'lrt' )

#End Options #stop editing####




#get the name of the script
SCRIPTNAME=`basename $0`
#get the current month
MONTH=$(date +%b)

  if  [ "$SCRIPTNAME" != "yearly" ] ;  
  then

       #check if local .website_backups directory exists. if not create it
       if [ ! -d "~/.website_backups/$SCRIPTNAME/db" ]; then
       mkdir -p ~/.website_backups/$SCRIPTNAME/db 
       fi
       
       #remove db files older than 1 day
       find ~/.website_backups/$SCRIPTNAME/db/*.sql -type f -daystart -mtime 1 -exec rm {} \;   

       #get recent version of databases from array
       for i in "${DBNAME[@]}"
       do
       mysqldump --opt --user=$DBUSER --password=$DBPW --host=$DBHOST $i --lock-tables=false   > ~/.website_backups/$SCRIPTNAME/db/$i.`date +\%Y-\%m-\%d_\%H-\%M-\%S`.sql
       done

       
       for i in "${FOLDERS[@]}"
       do
	rsync -a -t -q --delete  ~/$i  ~/.website_backups/$SCRIPTNAME/   
       done


       #there should be a .website_backup folder on your local webhost
       #now backup this to your DH backup account
         rsync -e ssh -a -q --delete --protocol=29 .website_backups/$SCRIPTNAME/ $REMUSER@$REMHOST:website_backups/$SCRIPTNAME/   

   else

    #check if local .website_backups directory exists. if not create it
       if [ ! -d "~/.website_backups/$SCRIPTNAME/$MONTH/db" ]; then
       mkdir -p ~/.website_backups/$SCRIPTNAME/$MONTH/db 
       fi

       #removes db files older than 1 day (if run daily will have keep 24 copies of database in daily folder)
       find ~/.website_backups/$SCRIPTNAME/$MONTH/db/*.sql -type f -daystart -mtime 1 -exec rm {} \;   


       #get recent version of databases from array
       for i in "${DBNAME[@]}"
       do
       mysqldump --opt --user=$DBUSER --password=$DBPW --host=$DBHOST $i --lock-tables=false   > ~/.website_backups/$SCRIPTNAME/$MONTH/db/$i.`date +\%Y-\%m-\%d_\%H-\%M-\%S`.sql
       done


      for i in "${FOLDERS[@]}"
       do
	rsync -a -t -q --delete  ~/$i  ~/.website_backups/$SCRIPTNAME/$MONTH/   
       done

       #there should be a .website_backup folder on your local webhost
       #now backup this to your DH backup account
         rsync -e ssh -a -q --delete --protocol=29 .website_backups/$SCRIPTNAME/$MONTH/ $REMUSER@$REMHOST:website_backups/$SCRIPTNAME/$MONTH/   


 fi

#6

Study up on bash redirection. The two specific ways to use it in this case:

append “> /dev/nul 2>&1” (without the quotes) to your cron command. This will thow away output end error output from the script. Cron won’t email each run anymore.

then within your script you can also make use of redirection by either opening “>” or appending “>>” to a log file and redirecting the output of various commands such as:

rsync -e ssh -av --delete --protocol=29 .website_backups/$SCRIPTNAME/ $REMUSER@$REMHOST:website_backups/$SCRIPTNAME/   >> ./logfile.txt 2>&1

It’s always safe to append (if the file doesn’t exist it will be created) adding something like “>> ./logfile.txt 2>&1” will open or append logfile.txt in the current directory .

of course then you need to address what to do with the logfiles and how much to keep. You might for example choose to make a fancier name for the logfile name incorporating the date. Or perhaps you want to open ~/backuplogs/somenameYYYYMMDD.txt file instead.

Anyway there are many different approaches, but just a suggestion on how to make use of redirection. Google “bash redirection” for many more examples.
[hr]
footnote: you also don’t have to use /dev/nul for your device in the cron command redirection, you could in fact open a logfile there as well, then skip redirecting output from individual bash statements. This would also stop the emails.


#7

Thanks for the tips. All great stuff; however, in this particular instance I simply wanted to stop cron from sending me emails with full verbose output of rsync.

I still want emails for any other types of errors (file not found, bad permissions, etc).

Adding the -q option did the trick


#8

My standard is to nullify the output from cron, write logs to an .htaccess protected and oddly named web directory (for easy viewing), and if I need to be alerted then send an alert email. I used to attach the log to the email, but I don’t even bother anymore, because I’ll have to look via PC anyway if I get one. Others may prefer an email for every run, but I find cron to be stable/reliable and don’t worry about it not running, (I can’t actually remember a time that the cron just got dropped, which precludes the need for email just to know it ran.) should it not be running Ill notice for other reasons soon enough.


#9

Okay, just bear with me because I am still learning how cron and scripts work, and I would really like to hear more about how you do things.

you said:
“My standard is to nullify the output from cron, write logs to an .htaccess protected and oddly named web directory (for easy viewing), and if I need to be alerted then send an alert email”

My script currently runs without any emails being sent, unless there is a problem and the cron cannot run. For example, I was doing some editing and changed the permissions on my script to no access (accidentally) and then later I recieved an email from cron telling me about the problem.

Before I added the -q switch to the rsync lines in my script, I was getting an email everytime it ran, with a full output of rsync. I did not want that. But I do want an email in case of some problems.

So that sounds like to me what you are doing, minus the log-files, which I do not really want for this script since it is just a repetitive back-up/sync

Having said that, I can see how a log would be nice to have for a script that was doing something a little more complex. If it were you, would you bother with log-files for something simple like a backup? I cannot really think of a use-case where I would need to look at logs. If the backup is not running I will be notified via cron. If it does run it just does its things and exits.


#10

You never want logs until you wish you had them, which is why I don’t email them.

I commented because you wrote that you changed to -q, but suggested people may want -v for debugging purposes.

The only real way to check if the cron is not running, is to collect a logfile and have a second cron that makes sure a log file appears, and may or may not grep it for certain contents.

I don’t know what I would do in this case, probably create a directory so that something like domain.com/oddnamedpath/ and slap an .htaccess on it to keep noisy people out of it, although they shouldn’t find it anyway. I probably would not go to the extreme of a supervision cron unless I perhaps had one that was checking several places/domains for logfiles. I would have the script keep x number of logfiles, purging the oldest.


#11

You should change the above now. Anyone with a DH account, and possibly others, have complete access to your database now.
[hr]
You might also want to look into https://github.com/carloslima/dhsnapshot


#12

I’ve edited out the data in question, but you still need to change your database password ASAP.

Also, for what it’s worth, we already run a script that looks kind of like this internally on a daily basis. The place where the backup files are stored isn’t directly accessible by customers, but it’s what we use for the “Restore DB” function in the MySQL panel.


#13

THanks. Still, it is nice to have a routine that is easily accessible. Especially if there are just a few files that need to be restored.
Also, having several folders with different times of backups in each is a nice option as well. For example, if I updated a WordPress theme and wanted to look at the style sheet from two months ago.


#14

Consider using git.