BASH脚本：使用wget下载连续编号的文件

https://stackoverflow.com/questions/1426522

07-07-2019
|

题

我有一个Web服务器，用于保存编号为Web应用程序的日志文件。文件名示例如下：

dbsclog01s001.log
dbsclog01s002.log
dbsclog01s003.log

最后3位数字是计数器，它们最多可以达到100个。

我经常打开网页浏览器，浏览到如下文件：

http://someaddress.com/logs/dbsclog01s001.log

并保存文件。当你获得50个日志时，这当然会有点烦人。我试图想出一个使用wget并传递

的BASH脚本

http://someaddress.com/logs/dbsclog01s*.log

但是我的脚本出了问题。无论如何，任何人都有关于如何做到这一点的样本？

谢谢！

解决方案

#!/bin/sh

if [ $# -lt 3 ]; then
        echo "Usage: $ ./seq_wget http://someaddress.com/logs/dbsclog01s%03d.log 1 50
 url_format seq_start seq_end [wget_args]"
        exit
fi

url_format=$1
seq_start=$2
seq_end=$3
shift 3

printf "$url_format\\n" `seq $seq_start $seq_end` | wget -i- "$@"

将上述内容保存为 seq_wget ，赋予其执行权限（ chmod + x seq_wget ），然后运行，例如：

$ wget http://someaddress.com/logs/dbsclog01s{001..050}.log

或者，如果你有Bash 4.0，你可以输入

<*>

或者，如果你有 curl 而不是 wget ，你可以按照Dennis Williamson的回答。

其他提示

curl 似乎支持范围。从 man 页面：

URL  
       The URL syntax is protocol dependent. You’ll find a  detailed  descrip‐
       tion in RFC 3986.

       You  can  specify  multiple  URLs or parts of URLs by writing part sets
       within braces as in:

        http://site.{one,two,three}.com

       or you can get sequences of alphanumeric series by using [] as in:

        ftp://ftp.numericals.com/file[1-100].txt
        ftp://ftp.numericals.com/file[001-100].txt    (with leading zeros)
        ftp://ftp.letters.com/file[a-z].txt

       No nesting of the sequences is supported at the moment, but you can use
       several ones next to each other:

        http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html

       You  can  specify  any amount of URLs on the command line. They will be
       fetched in a sequential manner in the specified order.

       Since curl 7.15.1 you can also specify step counter for the ranges,  so
       that you can get every Nth number or letter:

        http://www.numericals.com/file[1-100:10].txt
        http://www.letters.com/file[a-z:2].txt

你可能已经注意到它说“带有前导零”！

您可以在wget url中使用echo类型序列来下载一串数字......

wget http://someaddress.com/logs/dbsclog01s00 {1..3} .log

这也适用于字母

{a..z} {A..Z}

您可以结合使用 for loop i n bash with printf 命令（当然将 echo 修改为 wget 根据需要）：

$ for i in {1..10}; do echo "http://www.com/myurl`printf "%03d" $i`.html"; done
http://www.com/myurl001.html
http://www.com/myurl002.html
http://www.com/myurl003.html
http://www.com/myurl004.html
http://www.com/myurl005.html
http://www.com/myurl006.html
http://www.com/myurl007.html
http://www.com/myurl008.html
http://www.com/myurl009.html
http://www.com/myurl010.html

不确定你遇到了什么问题，但听起来像bash中的简单for循环会为你做。

for i in {1..999}; do
wget -k http://someaddress.com/logs/dbsclog01s$i.log -O your_local_output_dir_$i;
done

有趣的任务，所以我为你写了完整的脚本（结合了几个答案和更多）。这是：

#!/bin/bash
# fixed vars
URL=http://domain.com/logs/     # URL address 'till logfile name
PREF=logprefix                  # logfile prefix (before number)
POSTF=.log                      # logfile suffix (after number)
DIGITS=3                        # how many digits logfile's number have
DLDIR=~/Downloads               # download directory
TOUT=5                          # timeout for quit
# code
for((i=1;i<10**$DIGITS;++i))
do
        file=$PREF`printf "%0${DIGITS}d" $i`$POSTF   # local file name
        dl=$URL$file                                 # full URL to download    
        echo "$dl -> $DLDIR/$file"                   # monitoring, can be commented
        wget -T $TOUT -q $dl -O $file
        if [ "$?" -ne 0 ]                            # test if we finished
        then
                exit
        fi
done

在脚本开始时，您可以设置URL，日志文件前缀和后缀，在编号部分和下载目录中有多少位数。 Loop将下载它找到的所有日志文件，并在第一个不存在时自动退出（使用wget的超时）。

请注意，此脚本假定日志文件索引从1开始，而不是零，如示例中所述。

希望这有帮助。

在这里，您可以找到一个看起来像您想要的Perl脚本

http://osix.net/modules/article/?id=677

#!/usr/bin/perl
$program="wget"; #change this to proz if you have it ;-)
my $count=1; #the lesson number starts from 1
my $base_url= "http://www.und.nodak.edu/org/crypto/crypto/lanaki.crypt.class/lessons/lesson";
my $format=".zip"; #the format of the file to download
my $max=24; #the total number of files to download
my $url;

for($count=1;$count<=$max;$count++) {
    if($count<10) {
    $url=$base_url."0".$count.$format; #insert a '0' and form the URL
    }
    else {
    $url=$base_url.$count.$format; #no need to insert a zero
    }
    system("$program $url");
}

我刚看了一下'globbing'的wget联机帮助页讨论：

默认情况下，如果URL包含通配符，则会打开通配符。此选项可用于永久打开或关闭通配。您可能必须引用URL以防止它被shell扩展。 Globbing使Wget寻找一个特定于系统的目录列表。的这就是为什么它目前只使用Unix的FTP服务器（和那些模拟的Unix＆QUOT; LS＆QUOT;输出）的工作原理。

所以wget http：// ...不适用于globbing。

检查您的系统是否有seq，然后这很容易：

for i in $(seq -f "%03g" 1 10); do wget "http://.../dbsclog${i}.log"; done

如果您的系统使用jot命令而不是seq：

for i in $(jot -w "http://.../dbsclog%03d.log" 10); do wget $i; done

哦！这是我在学习bash自动化漫画下载时遇到的类似问题。

这样的事情应该有效：

for a in `seq 1 999`; do if [ ${#a} -eq 1 ]; then b="00" elif [ ${#a} -eq 2 ]; then b="0" fi echo "$a of 231" wget -q http://site.com/path/fileprefix$b$a.jpg

完成

晚会，但一个不需要编码的真正简单的解决方案是使用DownThemAll Firefox插件，它具有检索文件范围的功能。当我需要下载800个连续编号的文件时，这是我的解决方案。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow