Basically bulkGetter functions as a downloading command line tool accepting an input file as a feed with the desired link(s) to download. It can download files, save them to a specified location, rename them and it also supports resuming downloads.
For this tool to work, you need to have ‘wget’ tool, if you have Linux, more than likely is already included in your distribution. If you have OSX, then you need to download it, the only thing is that you have to compile it. Get wget. If you do not wanna go thru the hassle of compilation, then you can download my compiled version that I have provided with bulkGetter.
[topads][/topads]
If running OSX, like me, after you download wget, you need to include it in the PATH variable and make it executable. If by any reason you do not know what the PATH variable is, you can read what it is here: PATH Variable.
My tool can be used in four different ways. Below is the basic usage:
# Usage: ./bulkGetter.sh inputFile saveToPath [-rs | -rb | -rm] [newFileName] # -rs --rename single file # -rb --rename multiple files # -rm --rename files in bulk with multiple names
This tool is very specific. First argument must be your input file, second argument the path where you want your files saved, third argument can be either ‘-rs’, ‘-rb’ or ‘-rm’, and fourth argument the new file name for your file; arguments three and four are optional. For more usage information please refer to file ‘README.MD’ included in bulkGetter.zip at the end of this post. Running the script without any options, will display its usage.
Also, with every option, a different input file is required. You can download bulkGetter.zip at the end of this post, which includes sample input files.
Update 20130323: bulkGetter now version 0.02 uses a proxy server and user agent string by default. If you do not want to use them, just use bulkGetter 0.01 (included in bulkGetter.zip) or delete them from the v0.02.
And below is the code. (Click ‘Show Source’ to expand)
#!/bin/bash ######################################################################################################## # Simple script that uses wget to retreive internet files from a given input file. The script has four # options that accept three different type of input files depending if you want to rename the downloaded # files or not. Refer to 'readme.txt' for more usage information. # # Author: Esau Silva (jgezau) # Version: 0.02 # # wget used options # -t0 --unlimited number of tries # -c --resume broken downloads # -np --do not ascend to the parent directory # -P --save files to a specific directory # -O --rename the downloded file # --execute=http_proxy= --hides wget behind a proxy server # --user-agent= --mask user agent and display wget like browser # # ** Change History ** # User Date Description # jgezau 20121020 Initial coding # jgezau 20121030 Added wrapping quotation marks ("") to saveTo when creating directory # jgezau 20121219 Added "-p" to make directory structure that is several directories deep # jgezau 20130110 Changed IFS from ',' to ';' # jgezau 20130323 Added public proxy server support w/o password # Added user agent support to mask wget like a browser # Added 5 sec delay between each download when -rb or -rm is used # jgezau 20130810 Print to terminal number of files to download and file number currently # downloading ######################################################################################################## # Variables inFile=$1 saveTo=$2 renameOption=$3 newFileName=$4 userAgent="" # read "NOTE 2" in readme.txt proxyServer="" # read "NOTE 2" in readme.txt sleeping=5 # Flags good2go=0 changeIFS=0 noRename=0 rename=0 # Displays usage when script is ran w/o arguments if [[ $# -eq 0 ]]; then echo -e "\t Usage: ./bulkGetter.sh inputFile saveToPath [-rs | -rb | -rm] [newFileName]" echo -e "\t -rs --rename single file" echo -e "\t -rb --rename files in bulk" echo -e "\t -rm --rename files in bulk with multiple names" exit fi # Check if input file exists if [[ ! -f $inFile ]]; then echo -e "\t '$inFile' does not exist" echo -e "\t Now exiting script" exit fi # Check for correct arguments if [[ $# -eq 4 ]]; then if [[ $renameOption != "-rs" && $renameOption != "-rb" ]]; then echo -e "\tArgument three needs to be '-rs', '-rb' or '-rm'" echo -e "\tRun script w/o arguments for usage information...Now exiting script" exit 1 fi good2go=1 rename=1 elif [[ $# -eq 2 ]]; then good2go=1 noRename=1 elif [[ $# -eq 3 ]]; then if [[ $renameOption == "-rm" ]]; then changeIFS=1 good2go=1 rename=1 else echo -e "\tYou need one more argument." echo -e "\tRun script w/o arguments for usage information...Now exiting script" exit 1 fi elif [[ $# -gt 4 || $# -eq 1 ]]; then echo -e "\tbulkGetter accepts either 2 or 4 arguments." echo -e "\tRun script w/o arguments for usage information...Now exiting script" exit 1 fi # Perform bulkGetter's job if [[ $good2go -eq 1 ]]; then # If destination directory does not exists, create it if [[ ! -d "$saveTo" ]]; then mkdir -p "$saveTo" fi # Initialize the counter filesToDownload=$(cat $inFile | wc -l) echo "Downloading $filesToDownload files" # No Rename Files if [[ $noRename -eq 1 ]]; then for url in $(cat $inFile); do wget -t10 -c -np --execute=http_proxy="$proxyServer" -P "$saveTo" --user-agent="${userAgent}" "$url" done # Rename Files elif [[ $rename -eq 1 ]]; then # Rename single file if [[ $renameOption == "-rs" ]]; then url=$(cat $inFile) fileExt=$(echo "$url" | awk -F. '{if (NF>1) {print $NF}}') wget -t10 -c -np --execute=http_proxy="$proxyServer" -O "${saveTo}/${newFileName}.${fileExt}" --user-agent="$userAgent" "$url" # Rename files in bulk elif [[ $renameOption == "-rb" ]]; then part=1 for url in $(cat $inFile); do echo "Counter: $part of $filesToDownload" fileExt=$(echo "$url" | awk -F. '{if (NF>1) {print $NF}}') wget -t10 -c -np --execute=http_proxy="$proxyServer" -O "${saveTo}/${newFileName} ${part}.${fileExt}" --user-agent="${userAgent}" "$url" sleep $sleeping (( part++ )) done # Rename files in bulk with multiple names elif [[ $renameOption == "-rm" ]]; then # Change IFS to ';' OLDIFS=$IFS IFS=';' inOneLiner=$(cat $inFile | sed 's:$:;:' | tr -d '\n' | sed 's:;$::') # Adding URL and New FileName to an array count=1 for line in $inOneLiner; do isOdd=$(( $count % 2 )) if [[ $isOdd -eq 1 ]]; then url[$count]=$line else file[$count]=$line fi (( count++ )) done # Change IFS back to default IFS=$OLDIFS count2=1 for (( i = 1; i < $count; i++ )); do echo "Counter: $count2 of $filesToDownload" fileExt=$(echo "${url[$i]}" | awk -F. '{if (NF>1) {print $NF}}') wget -t10 -c -np --execute=http_proxy="$proxyServer" -O "${saveTo}/${file[$i+1]}.${fileExt}" --user-agent="${userAgent}" "${url[$i]}" sleep $sleeping (( count2++ )) (( i++ )) done fi fi fi
For your convenience, I have ziped this script with wget utility and some sample input files. Get bulkGetter.
You are more than welcomed to fork my repo in GitHub. The script and use it/modify it in any way you might think possible, just one thing I ask, please do not remove my credit 😉
If you have any suggestions as to improving bulkGetter, you can post it below and I might consider it, for the time being, this satisfies my needs 🙂
Update 20121030: I had left out enclosing quotation marks (“”) to “$saveTo” variable when creating a new directory, therefore script was failing if new directory had spaces on it. All is good now 🙂
Update 20121219: Added “-p” to mkdir to make directory structure that is several directories deep
Update 20130110: Changed IFS from ‘,’ to ‘;’
Update 20130323: Added public proxy server support w/o password; Added user agent support to mask wget like a browser; Added 5 sec delay between each download when -rb or -rm is used
Update 20130810: Print to terminal number of files to download and file number currently downloading
Share if you liked this article…
[bottomads][/bottomads]
RT @jgezau: #bulkGetter – A Script Wrapped Around #wget http://t.co/PmO8yfHA
[…] I have developed a script to do exactly this fairly easy. You just have to use "-rb" option. Refer to link bulkGetter […]
Thank you so much for this. Great tool. FYI: There is a misstake in the help text. It reads that lists and filenames should be separeted by a colon but in reality it’s a semicolon.
Thanks for letting me know. I originally had it with a comma, but changed it later to a semicolon. It has been corrected now in the zip archive.
You can run bulkGetter.sh from anywhere by moving it to a folder included in the systems paths.
1. Check you paths with command:
echo $PATH
2. Move bulkGetter.sh to on of them, ex:
mv bulkGetter.sh /usr/bin
3. Now you can run bulkGetter.sh without ./
And don’t forget to make bulkGetter.sh executable with:
chmod a+x bulkGetter.sh
Absolutely…I just don’t like to mix my scripts with the system scripts/paths. The way I do it now is I create a folder called ‘bin’ right under home and include that path in my environment variable 🙂 (just personal preference)
One more thing. wget is a bit slow. It would be very cool to see another version of the wrapper for aria2. aria2 supports threaded downloads.
http://stackoverflow.com/questions/3430810/wget-download-with-multiple-connection-simultaneously
Thanks. I’ll look into aria2, never heard of it