Sat Jul 18 09:49:17 PDT 2015

Transferring Files Efficiently

Say you want to collect all your jpg graphics files from one computer and transfer them to another computer - what is the best way of going about this?

If you are well organized you can just transfer one directory from one machine to the other. However, frequently we spread files around on a machine (or our various programs do) and what is first needed is a search to find the important information - then collect it all in one place - and finally a transfer to its new location. Here is how you can go about collecting and saving all the jpg files on a machine.

Firstly - if you have a Windows machine - make sure that you have access to Cygwin so that you can use the general Linux command line options and commands that Cygwin makes available. If you you have a Linux or OS X machine - you are good to go. To achieve your objective you are going to need to:

    1. Find the files
    2. Clean up the names
    3. Make the files list into a tar archive
    4. Transfer the archive to the other machine (using ftp, rsync, etc.)
    5. Extract the files (using tar xvzf filename.tgz)

To find the files - the find command is the appropriate tool:

find . -name "*.jpg" -print

Resulting in:

find . -name "*.jpg" -print
./My Music/AlbumArtSmall.jpg
./My Music/Folder.jpg
./My Pictures/DSC00388.jpg
./My Pictures/MyPicture.jpg
./My Pictures/UpgradeDialog.jpg

However, there are also files which have the extension .JPG and there might be files with .jpeg and any capitalization combination between these choices. So use the following find command:

find . \( -name "*.[jJ][pP][eE][gG]" -o -name "*.[jJ][pP][gG]" \) -print

This might seem complex, but the segments in square brackets (like [jJ]) enable the find command to select files with any possible capitalization pattern of jpg or jpeg as an extension - and print out the path of the file. The output of the command is now:

find . \( -name "*.[jJ][pP][eE][gG]" -o -name "*.[jJ][pP][gG]" \) -print
./a.jpEg
./My Music/AlbumArtSmall.jpg
./My Music/Folder.jpg
./My Pictures/DSC00388.jpg
./My Pictures/IMG_0175.JPG
./My Pictures/IMG_0176.JPG
./My Pictures/IMG_0180.JPG
./My Pictures/MyPicture.jpg
./My Pictures/New Folder/IMG_0391.JPG
./My Pictures/New Folder/IMG_0396.JPG
./My Pictures/New Folder/IMG_0398.JPG

So, finding the files is no longer a problem. However, they must be saved in a suitable archive - so that they can be transfered together. There are several possible commands (zip is one possible choice). But let's use the simple tar command. To glue the find output and the tar command together use the xargs command. This takes a list of, typically files as provided by find, and passes them into a command specified as its first argument. As xargs using spaces to delmit its own arguments - it is necessary to make sure that any spaces in filesname are appropriately escaped. Hence some sed is required. The short sed script is shown below - it has the effect of escaping any non-alaphabetic or numeric character in the filename. This is a good remedy for the various other characters which may be inserted in Windows file names that on occasion can confuse the Cygwin command line (like ampersands and dollars, for instance). The sed command says 'for characters which are not alpha numeric, replace them with the character itself with a backslash prepended to the character'. (There are 5 backslashes in the sed script - to escape the backslashes from the shell and to account for the backslash-1 nomenclature that sed uses to refer to the matched token (the non-alphanumeric character). So the command to output cleaned up filenames now looks like this:

find . \( -name "*.[jJ][pP][eE][gG]" -o -name "*.[jJ][pP][gG]" \) \
-print | sed -r "s/([^a-zA-Z0-9])/\\\\\1/g"
\.\/a\.jpEg
\.\/My\ Music\/AlbumArtSmall\.jpg
\.\/My\ Music\/Folder\.jpg
\.\/My\ Pictures\/DSC00388\.jpg
\.\/My\ Pictures\/IMG\_0175\.JPG
\.\/My\ Pictures\/IMG\_0176\.JPG
\.\/My\ Pictures\/IMG\_0178\.JPG

As you can see from the output - the term 'cleaned up' is used loosely. However, the good news is that xargs can deal with this input easily. The command to hook up tar to this output is "xargs tar -rvf jpg.tar" which says from the stream of files supplied to xargs, provide them as arguments to tar, in append mode (-r) to add them to the tar archive (-f) jpg.tar. The (-v) option makes tar run in verbose mode so that you can see what it is doing. Here is the command now:

find . \( -name "*.[jJ][pP][eE][gG]" -o -name "*.[jJ][pP][gG]" \) \
-print | sed -r "s/([^a-zA-Z0-9])/\\\\\1/g" | xargs tar -rvf jpg.tar
./a.jpEg
./My Music/AlbumArtSmall.jpg
./My Music/Folder.jpg
./My Pictures/DSC00388.jpg
./My Pictures/IMG_0175.JPG
./My Pictures/IMG_0176.JPG
./My Pictures/IMG_0178.JPG
./My Pictures/IMG_0179.JPG
./My Pictures/IMG_0180.JPG
./My Pictures/MyPicture.jpg
./My Pictures/New Folder/IMG_0391.JPG

Now all the jpg files are safely contained within a single archive, jpg.tar - and this file can be transferred to another computer. The files can then be extracted using:

tar -xvf jpg.tar

And you are done...!


Posted by ZFS | Permanent link | File under: bash