As part of my backup system, I am working with md5sums. These file contains relative paths and in order to combine these files I must update the relative path written in the md5sums file.

Want I want to do

In md5sums.txt we have a lot of lines like this

 2d845bca08dce83401788a60df88db45 ./it support.wmv

and we want to look like this

2d845bca08dce83401788a60df88db45 funstuff/it support.wmv

Using Awk

The awk command can do some of this for us, doing eg.

awk '{ print $1 "  " $2 }' md5sums.txt

Will output the first two columns with two spaces between the columns as md5sum wants it. The “./” part is a problem and the space between “it” and “support” is a problem.

The -F parameter tells awk how to divide columns

awk -F"  " '{ print $1 "  " $2 }' md5sums.txt

will effectively give the same file as just “cat md5sums.txt”

Using the substitute command of awk gives a good result

awk -F"  " '{ sub( /.\//, "", $2); print $1 "  " "funstuff/"$2}' md5sums.txt

This will substitute the first occurrence of “./” in the second column with “” (ie. nothing). To make it more robust awk supports comparison and all that, but that is something for another time.

And if you have double spaces in the filenames, we need to do some more elaborate stuff

awk -F" " 'BEGIN{ORS="";}{sub( /.\//, "", $2); print $1 " funstuff/" $2; for (i=3; i <=NF; i++) print " " $i; print "\n"}' md5sums.txt


Some sites I used for the above


September 17, 2012

