Basically bulkGetter functions as a downloading command line tool accepting an input file as a feed with the desired link(s) to download. It can download files, save them to a specified location, rename them and it also supports resuming downloads.
For this tool to work, you need to have ‘wget’ tool, if you have Linux, more than likely is already included in your distribution. If you have OSX, then you need to download it, the only thing is that you have to compile it. Get wget. If you do not wanna go thru the hassle of compilation, then you can download my compiled version that I have provided with bulkGetter.
[topads][/topads]
If running OSX, like me, after you download wget, you need to include it in the PATH variable and make it executable. If by any reason you do not know what the PATH variable is, you can read what it is here: PATH Variable.
My tool can be used in four different ways. Below is the basic usage:
# Usage: ./bulkGetter.sh inputFile saveToPath [-rs | -rb | -rm] [newFileName] # -rs --rename single file # -rb --rename multiple files # -rm --rename files in bulk with multiple names
This tool is very specific. First argument must be your input file, second argument the path where you want your files saved, third argument can be either ‘-rs’, ‘-rb’ or ‘-rm’, and fourth argument the new file name for your file; arguments three and four are optional. For more usage information please refer to file ‘README.MD’ included in bulkGetter.zip at the end of this post. Running the script without any options, will display its usage.
Also, with every option, a different input file is required. You can download bulkGetter.zip at the end of this post, which includes sample input files.
Update 20130323: bulkGetter now version 0.02 uses a proxy server and user agent string by default. If you do not want to use them, just use bulkGetter 0.01 (included in bulkGetter.zip) or delete them from the v0.02.
And below is the code. (Click ‘Show Source’ to expand)
#!/bin/bash
########################################################################################################
# Simple script that uses wget to retreive internet files from a given input file. The script has four
# options that accept three different type of input files depending if you want to rename the downloaded
# files or not. Refer to 'readme.txt' for more usage information.
#
# Author: Esau Silva (jgezau)
# Version: 0.02
#
# wget used options
# -t0 --unlimited number of tries
# -c --resume broken downloads
# -np --do not ascend to the parent directory
# -P --save files to a specific directory
# -O --rename the downloded file
# --execute=http_proxy= --hides wget behind a proxy server
# --user-agent= --mask user agent and display wget like browser
#
# ** Change History **
# User Date Description
# jgezau 20121020 Initial coding
# jgezau 20121030 Added wrapping quotation marks ("") to saveTo when creating directory
# jgezau 20121219 Added "-p" to make directory structure that is several directories deep
# jgezau 20130110 Changed IFS from ',' to ';'
# jgezau 20130323 Added public proxy server support w/o password
# Added user agent support to mask wget like a browser
# Added 5 sec delay between each download when -rb or -rm is used
# jgezau 20130810 Print to terminal number of files to download and file number currently
# downloading
########################################################################################################
# Variables
inFile=$1
saveTo=$2
renameOption=$3
newFileName=$4
userAgent="" # read "NOTE 2" in readme.txt
proxyServer="" # read "NOTE 2" in readme.txt
sleeping=5
# Flags
good2go=0
changeIFS=0
noRename=0
rename=0
# Displays usage when script is ran w/o arguments
if [[ $# -eq 0 ]]; then
echo -e "\t Usage: ./bulkGetter.sh inputFile saveToPath [-rs | -rb | -rm] [newFileName]"
echo -e "\t -rs --rename single file"
echo -e "\t -rb --rename files in bulk"
echo -e "\t -rm --rename files in bulk with multiple names"
exit
fi
# Check if input file exists
if [[ ! -f $inFile ]]; then
echo -e "\t '$inFile' does not exist"
echo -e "\t Now exiting script"
exit
fi
# Check for correct arguments
if [[ $# -eq 4 ]]; then
if [[ $renameOption != "-rs" && $renameOption != "-rb" ]]; then
echo -e "\tArgument three needs to be '-rs', '-rb' or '-rm'"
echo -e "\tRun script w/o arguments for usage information...Now exiting script"
exit 1
fi
good2go=1
rename=1
elif [[ $# -eq 2 ]]; then
good2go=1
noRename=1
elif [[ $# -eq 3 ]]; then
if [[ $renameOption == "-rm" ]]; then
changeIFS=1
good2go=1
rename=1
else
echo -e "\tYou need one more argument."
echo -e "\tRun script w/o arguments for usage information...Now exiting script"
exit 1
fi
elif [[ $# -gt 4 || $# -eq 1 ]]; then
echo -e "\tbulkGetter accepts either 2 or 4 arguments."
echo -e "\tRun script w/o arguments for usage information...Now exiting script"
exit 1
fi
# Perform bulkGetter's job
if [[ $good2go -eq 1 ]]; then
# If destination directory does not exists, create it
if [[ ! -d "$saveTo" ]]; then
mkdir -p "$saveTo"
fi
# Initialize the counter
filesToDownload=$(cat $inFile | wc -l)
echo "Downloading $filesToDownload files"
# No Rename Files
if [[ $noRename -eq 1 ]]; then
for url in $(cat $inFile); do
wget -t10 -c -np --execute=http_proxy="$proxyServer" -P "$saveTo" --user-agent="${userAgent}" "$url"
done
# Rename Files
elif [[ $rename -eq 1 ]]; then
# Rename single file
if [[ $renameOption == "-rs" ]]; then
url=$(cat $inFile)
fileExt=$(echo "$url" | awk -F. '{if (NF>1) {print $NF}}')
wget -t10 -c -np --execute=http_proxy="$proxyServer" -O "${saveTo}/${newFileName}.${fileExt}" --user-agent="$userAgent" "$url"
# Rename files in bulk
elif [[ $renameOption == "-rb" ]]; then
part=1
for url in $(cat $inFile); do
echo "Counter: $part of $filesToDownload"
fileExt=$(echo "$url" | awk -F. '{if (NF>1) {print $NF}}')
wget -t10 -c -np --execute=http_proxy="$proxyServer" -O "${saveTo}/${newFileName} ${part}.${fileExt}" --user-agent="${userAgent}" "$url"
sleep $sleeping
(( part++ ))
done
# Rename files in bulk with multiple names
elif [[ $renameOption == "-rm" ]]; then
# Change IFS to ';'
OLDIFS=$IFS
IFS=';'
inOneLiner=$(cat $inFile | sed 's:$:;:' | tr -d '\n' | sed 's:;$::')
# Adding URL and New FileName to an array
count=1
for line in $inOneLiner; do
isOdd=$(( $count % 2 ))
if [[ $isOdd -eq 1 ]]; then
url[$count]=$line
else
file[$count]=$line
fi
(( count++ ))
done
# Change IFS back to default
IFS=$OLDIFS
count2=1
for (( i = 1; i < $count; i++ )); do
echo "Counter: $count2 of $filesToDownload"
fileExt=$(echo "${url[$i]}" | awk -F. '{if (NF>1) {print $NF}}')
wget -t10 -c -np --execute=http_proxy="$proxyServer" -O "${saveTo}/${file[$i+1]}.${fileExt}" --user-agent="${userAgent}" "${url[$i]}"
sleep $sleeping
(( count2++ ))
(( i++ ))
done
fi
fi
fi
For your convenience, I have ziped this script with wget utility and some sample input files. Get bulkGetter.
You are more than welcomed to fork my repo in GitHub. The script and use it/modify it in any way you might think possible, just one thing I ask, please do not remove my credit 😉
If you have any suggestions as to improving bulkGetter, you can post it below and I might consider it, for the time being, this satisfies my needs 🙂
Update 20121030: I had left out enclosing quotation marks (“”) to “$saveTo” variable when creating a new directory, therefore script was failing if new directory had spaces on it. All is good now 🙂
Update 20121219: Added “-p” to mkdir to make directory structure that is several directories deep
Update 20130110: Changed IFS from ‘,’ to ‘;’
Update 20130323: Added public proxy server support w/o password; Added user agent support to mask wget like a browser; Added 5 sec delay between each download when -rb or -rm is used
Update 20130810: Print to terminal number of files to download and file number currently downloading
Share if you liked this article…
[bottomads][/bottomads]
RT @jgezau: #bulkGetter – A Script Wrapped Around #wget http://t.co/PmO8yfHA
[…] I have developed a script to do exactly this fairly easy. You just have to use "-rb" option. Refer to link bulkGetter […]
Thank you so much for this. Great tool. FYI: There is a misstake in the help text. It reads that lists and filenames should be separeted by a colon but in reality it’s a semicolon.
Thanks for letting me know. I originally had it with a comma, but changed it later to a semicolon. It has been corrected now in the zip archive.
You can run bulkGetter.sh from anywhere by moving it to a folder included in the systems paths.
1. Check you paths with command:
echo $PATH
2. Move bulkGetter.sh to on of them, ex:
mv bulkGetter.sh /usr/bin
3. Now you can run bulkGetter.sh without ./
And don’t forget to make bulkGetter.sh executable with:
chmod a+x bulkGetter.sh
Absolutely…I just don’t like to mix my scripts with the system scripts/paths. The way I do it now is I create a folder called ‘bin’ right under home and include that path in my environment variable 🙂 (just personal preference)
One more thing. wget is a bit slow. It would be very cool to see another version of the wrapper for aria2. aria2 supports threaded downloads.
http://stackoverflow.com/questions/3430810/wget-download-with-multiple-connection-simultaneously
Thanks. I’ll look into aria2, never heard of it