bulkGetter – A Script Wrapped Around wget

Basically bulkGetter functions as a downloading command line tool accepting an input file as a feed with the desired link(s) to download. It can download files, save them to a specified location, rename them and it also supports resuming downloads.

For this tool to work, you need to have ‘wget’ tool, if you have Linux, more than likely is already included in your distribution. If you have OSX, then you need to download it, the only thing is that you have to compile it. Get wget. If you do not wanna go thru the hassle of compilation, then you can download my compiled version that I have provided with bulkGetter.

[topads][/topads]

If running OSX, like me, after you download wget, you need to include it in the PATH variable and make it executable. If by any reason you do not know what the PATH variable is, you can read what it is here: PATH Variable.

My tool can be used in four different ways. Below is the basic usage:

# Usage: ./bulkGetter.sh inputFile saveToPath [-rs | -rb | -rm] [newFileName]
# -rs --rename single file
# -rb --rename multiple files
# -rm --rename files in bulk with multiple names

This tool is very specific. First argument must be your input file, second argument the path where you want your files saved, third argument can be either ‘-rs’, ‘-rb’ or ‘-rm’, and fourth argument the new file name for your file; arguments three and four are optional. For more usage information please refer to file ‘README.MD’ included in bulkGetter.zip at the end of this post. Running the script without any options, will display its usage.

Also, with every option, a different input file is required. You can download bulkGetter.zip at the end of this post, which includes sample input files.

Update 20130323: bulkGetter now version 0.02 uses a proxy server and user agent string by default. If you do not want to use them, just use bulkGetter 0.01 (included in bulkGetter.zip) or delete them from the v0.02.

And below is the code. (Click ‘Show Source’ to expand)

#!/bin/bash
########################################################################################################
# Simple script that uses wget to retreive internet files from a given input file. The script has four
# options that accept three different type of input files depending if you want to rename the downloaded
# files or not. Refer to 'readme.txt' for more usage information.
#
# Author: Esau Silva (jgezau)
# Version: 0.02
#
# wget used options
# -t0						--unlimited number of tries
# -c						--resume broken downloads
# -np						--do not ascend to the parent directory
# -P						--save files to a specific directory
# -O						--rename the downloded file
# --execute=http_proxy=		--hides wget behind a proxy server
# --user-agent=				--mask user agent and display wget like browser
#
# ** Change History **
# User		Date		Description
# jgezau	20121020	Initial coding
# jgezau	20121030	Added wrapping quotation marks ("") to saveTo when creating directory
# jgezau	20121219	Added "-p" to make directory structure that is several directories deep
# jgezau	20130110	Changed IFS from ',' to ';'
# jgezau	20130323	Added public proxy server support w/o password
#						Added user agent support to mask wget like a browser
#						Added 5 sec delay between each download when -rb or -rm is used
# jgezau	20130810	Print to terminal number of files to download and file number currently
#						downloading
########################################################################################################

# Variables
inFile=$1
saveTo=$2
renameOption=$3
newFileName=$4
userAgent=""	# read "NOTE 2" in readme.txt
proxyServer=""	# read "NOTE 2" in readme.txt
sleeping=5

# Flags
good2go=0
changeIFS=0
noRename=0
rename=0

# Displays usage when script is ran w/o arguments
if [[ $# -eq 0 ]]; then
	echo -e "\t Usage: ./bulkGetter.sh inputFile saveToPath [-rs | -rb | -rm] [newFileName]"
	echo -e "\t	-rs	--rename single file"
	echo -e "\t	-rb	--rename files in bulk"
	echo -e "\t	-rm	--rename files in bulk with multiple names"
	exit
fi

# Check if input file exists
if [[ ! -f $inFile ]]; then
	echo -e "\t '$inFile' does not exist"
	echo -e "\t Now exiting script"
	exit
fi

# Check for correct arguments
if [[ $# -eq 4 ]]; then
	if [[ $renameOption != "-rs" && $renameOption != "-rb" ]]; then
		echo -e "\tArgument three needs to be '-rs', '-rb' or '-rm'"
		echo -e "\tRun script w/o arguments for usage information...Now exiting script"
		exit 1
	fi
	good2go=1
	rename=1
elif [[ $# -eq 2 ]]; then
	good2go=1
	noRename=1
elif [[ $# -eq 3 ]]; then
	if [[ $renameOption == "-rm" ]]; then
		changeIFS=1
		good2go=1
		rename=1
	else
		echo -e "\tYou need one more argument."
		echo -e "\tRun script w/o arguments for usage information...Now exiting script"
		exit 1
	fi
elif [[ $# -gt 4 || $# -eq 1 ]]; then
	echo -e "\tbulkGetter accepts either 2 or 4 arguments."
	echo -e "\tRun script w/o arguments for usage information...Now exiting script"
	exit 1
fi

# Perform bulkGetter's job
if [[ $good2go -eq 1 ]]; then
	
	# If destination directory does not exists, create it
	if [[ ! -d "$saveTo" ]]; then
		mkdir -p "$saveTo"
	fi

	# Initialize the counter
	filesToDownload=$(cat $inFile | wc -l)
	echo "Downloading $filesToDownload files"

	# No Rename Files
	if [[ $noRename -eq 1 ]]; then
		for url in $(cat $inFile); do
			wget -t10 -c -np --execute=http_proxy="$proxyServer" -P "$saveTo" --user-agent="${userAgent}" "$url"
		done
	
	# Rename Files
	elif [[ $rename -eq 1 ]]; then 
		
		# Rename single file
		if [[ $renameOption == "-rs" ]]; then
			url=$(cat $inFile)
			fileExt=$(echo "$url" | awk -F. '{if (NF>1) {print $NF}}')
			wget -t10 -c -np --execute=http_proxy="$proxyServer" -O "${saveTo}/${newFileName}.${fileExt}" --user-agent="$userAgent" "$url"
		
		# Rename files in bulk   
		elif [[ $renameOption == "-rb" ]]; then
			part=1
			for url in $(cat $inFile); do
				echo "Counter: $part of $filesToDownload"
				fileExt=$(echo "$url" | awk -F. '{if (NF>1) {print $NF}}')
				wget -t10 -c -np --execute=http_proxy="$proxyServer" -O "${saveTo}/${newFileName} ${part}.${fileExt}" --user-agent="${userAgent}" "$url"
				sleep $sleeping
				(( part++ ))
			done
		
		# Rename files in bulk with multiple names
		elif [[ $renameOption == "-rm" ]]; then
			
			# Change IFS to ';'
			OLDIFS=$IFS
			IFS=';'
			
			inOneLiner=$(cat $inFile | sed 's:$:;:' | tr -d '\n' | sed 's:;$::')

			# Adding URL and New FileName to an array
			count=1
			for line in $inOneLiner; do
				isOdd=$(( $count % 2 ))
				if [[ $isOdd -eq 1 ]]; then
					url[$count]=$line
				else
					file[$count]=$line
				fi
				(( count++ ))
			done
			
			# Change IFS back to default
			IFS=$OLDIFS

			count2=1
			for (( i = 1; i < $count; i++ )); do
				echo "Counter: $count2 of $filesToDownload"
				fileExt=$(echo "${url[$i]}" | awk -F. '{if (NF>1) {print $NF}}')
				wget -t10 -c -np --execute=http_proxy="$proxyServer" -O "${saveTo}/${file[$i+1]}.${fileExt}" --user-agent="${userAgent}" "${url[$i]}"
				sleep $sleeping
				(( count2++ ))
				(( i++ ))
			done
		fi
	fi
fi

For your convenience, I have ziped this script with wget utility and some sample input files. Get bulkGetter.

You are more than welcomed to fork my repo in GitHub. The script and use it/modify it in any way you might think possible, just one thing I ask, please do not remove my credit 😉

If you have any suggestions as to improving bulkGetter, you can post it below and I might consider it, for the time being, this satisfies my needs 🙂

Update 20121030: I had left out enclosing quotation marks (“”) to “$saveTo” variable when creating a new directory, therefore script was failing if new directory had spaces on it. All is good now 🙂
Update 20121219: Added “-p” to mkdir to make directory structure that is several directories deep
Update 20130110: Changed IFS from ‘,’ to ‘;’
Update 20130323: Added public proxy server support w/o password; Added user agent support to mask wget like a browser; Added 5 sec delay between each download when -rb or -rm is used
Update 20130810: Print to terminal number of files to download and file number currently downloading

Share if you liked this article…

[bottomads][/bottomads]

Spread the love

8 thoughts on “bulkGetter – A Script Wrapped Around wget

  1. Esau Silva says:

    RT @jgezau: #bulkGetter – A Script Wrapped Around #wget http://t.co/PmO8yfHA

  2. […] I have developed a script to do exactly this fairly easy. You just have to use "-rb" option. Refer to link bulkGetter […]

  3. Orm says:

    Thank you so much for this. Great tool. FYI: There is a misstake in the help text. It reads that lists and filenames should be separeted by a colon but in reality it’s a semicolon.

    • jgezau says:

      Thanks for letting me know. I originally had it with a comma, but changed it later to a semicolon. It has been corrected now in the zip archive.

  4. Orm says:

    You can run bulkGetter.sh from anywhere by moving it to a folder included in the systems paths.
    1. Check you paths with command:
    echo $PATH
    2. Move bulkGetter.sh to on of them, ex:
    mv bulkGetter.sh /usr/bin
    3. Now you can run bulkGetter.sh without ./

    And don’t forget to make bulkGetter.sh executable with:
    chmod a+x bulkGetter.sh

    • jgezau says:

      Absolutely…I just don’t like to mix my scripts with the system scripts/paths. The way I do it now is I create a folder called ‘bin’ right under home and include that path in my environment variable 🙂 (just personal preference)

  5. Orm says:

    One more thing. wget is a bit slow. It would be very cool to see another version of the wrapper for aria2. aria2 supports threaded downloads.
    http://stackoverflow.com/questions/3430810/wget-download-with-multiple-connection-simultaneously

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.