Mon Sep 7 07:21:40 CEST 2015

Bash Directory Synchronization

Here is a simple Bash script which illustrates the use of find and awk in determining which files to update when synchronizing two directories. You could use the rsync command to carry out this task for you. However, rsync is a little too efficient and terse, and an open script, even if it is mainly awk, allows you to understand precisely what is about to happen to your files. The script works by using 'find' to gather data about the directories being synchronized. A list of synchronization commands (simple cp's or rm's) are presented to the user based on this analysis, and the user can then decide whether to execute the commands or not. In my tests the analysis of 1.3 GB of files on hard drive and on USB took around 3 seconds - so the speed of this script is not far from the speed of rsync itself.

The usage reporting and error checking are minimal, the arguments are: ds.sh directory1 directory2 time-window (in seconds).

(and all arguments are compulsory). So, for example, you might type:

./ds.sh usb_documents drive_documents 2

to synchronize your USB drive documents with your hard drive documents.

This script is experimental - please feel to use this - at your own risk. If you have questions or comments please let me know.

#!/bin/sh
awk '
BEGIN{
  timewindow=ARGV[3]
  print "The time window is: " timewindow
  dir1="\"" ARGV[1] "\"/"
  dir2="\"" ARGV[2] "\"/"
  readdir(dir1, lista, typea)
  readdir(dir2, listb, typeb)
  for(filea in lista){
    if(filea in listb){
      if(typea[filea] == "f"){
        timediff=lista[filea]-listb[filea]
        if(timediff > timewindow){
          com[++ncom]="# file in source directory newer than target"
          com[++ncom]="cp -a " dir1 "\"" filea "\"" " " dir2 "\"" filea "\""
        }
        if(timediff < -timewindow){
          print "# WARNING NEWER FILE IN TARGET DIRECTORY"
          print "# files concerned are: "
          print "# " dir1 "\"" filea "\"" " " dir2 "\"" filea "\""
        }
      }
    }else{
      if(typea[filea] == "d" ){
        dcom[++ndcom]="# directory needs to be created in the target"
        dcom[++ndcom]="mkdir -p " substr(dir2,1,length(dir2)-2) \
                       substr(filea,2) "\""
      } else {
        com[++ncom]="# file needs to be copied to the target"
        com[++ncom]="cp -a " substr(dir1,1,length(dir1)-2) substr(filea,2) \
                 "\"" " " substr(dir2,1,length(dir2)-2) substr(filea,2) "\""
      }
    }
  }
  for(fileb in listb){
    if(!(fileb in lista)){
      com[++ncom]="# need to remove file in target not in source"
      com[++ncom]="rm -f " dir2 "\"" fileb "\""
    }
  }
  if(!ncom){
    print "No updates required"
    exit
  }
  print "The following commands are needed to synchronize directories:"
  for(i=1;i<=ndcom;i++){
    print dcom[i]
  }
  for(i=1;i<=ncom;i++){
    print com[i]
  }
  print "Do you want to execute these commands?"
  getline ans < "/dev/tty"
  if( ans == "y" || ans == "Y"){
    for(i=1;i<=ndcom;i++){
      print "Executing: " dcom[i]
      escapefilename(dcom[i])
      system(dcom[i])
      close (dcom[i])
    }
    for(i=1;i<=ncom;i++){
      print "Executing: " com[i]
      escapefilename(com[i])
      system(com[i])
      close (com[i])
    }
  }
}
function escapefilename(name){
  gsub("\\$", "\\$", name)     # deal with dollars in filename
  gsub("\\(", "\\(", name)     # and parentheses
  gsub("\\)", "\\)", name)
}
function readdir(dir, list, type,        timestamp, ftype, name){
  cmd="cd " dir ";find . -printf \"%T@\\t%y\\t%p\\n\""
  print "Building list of files in: " dir
  while (cmd | getline > 0){
    timestamp=$1
    ftype=$2
    $1=$2=""
    name=substr($0,3)
    list[name]=int(timestamp)
    type[name]=ftype
  }
  close(cmd)
}' "$1" "$2" "$3"

Posted by ZFS | Permanent link | File under: bash