BASH 스크립트 : WGET로 연속 번호가 매겨진 파일 다운로드

https://stackoverflow.com/questions/1426522

07-07-2019
|

문제

웹 응용 프로그램 번호의 로그 파일을 저장하는 웹 서버가 있습니다. 이에 대한 파일 이름 예제는 다음과 같습니다.

dbsclog01s001.log
dbsclog01s002.log
dbsclog01s003.log

마지막 3 자리는 카운터이며 언젠가 100까지 얻을 수 있습니다.

일반적으로 웹 브라우저를 열고 다음과 같은 파일로 탐색합니다.

http://someaddress.com/logs/dbsclog01s001.log

파일을 저장하십시오. 물론 이것은 50 개의 로그를 얻을 때 약간 성가 시게됩니다. 나는 wget과 통과를위한 bash 스크립트를 생각해 내려고 노력했다.

http://someaddress.com/logs/dbsclog01s*.log

그러나 나는 내 대본에 문제가 있습니다. 어쨌든, 누구 든지이 작업을 수행하는 방법에 대한 샘플이 있습니까?

감사해요!

해결책

#!/bin/sh

if [ $# -lt 3 ]; then
        echo "Usage: $0 url_format seq_start seq_end [wget_args]"
        exit
fi

url_format=$1
seq_start=$2
seq_end=$3
shift 3

printf "$url_format\\n" `seq $seq_start $seq_end` | wget -i- "$@"

위를 저장하십시오 seq_wget, IT 실행 권한을 부여합니다 (chmod +x seq_wget), 그런 다음 실행하십시오.

$ ./seq_wget http://someaddress.com/logs/dbsclog01s%03d.log 1 50

또는 Bash 4.0이 있으면 입력 할 수 있습니다.

$ wget http://someaddress.com/logs/dbsclog01s{001..050}.log

또는 당신이 있다면 curl 대신에 wget, Dennis Williamson의 대답을 따를 수 있습니다.

다른 팁

curl 범위를 지원하는 것 같습니다. 로부터 man 페이지:

URL  
       The URL syntax is protocol dependent. You’ll find a  detailed  descrip‐
       tion in RFC 3986.

       You  can  specify  multiple  URLs or parts of URLs by writing part sets
       within braces as in:

        http://site.{one,two,three}.com

       or you can get sequences of alphanumeric series by using [] as in:

        ftp://ftp.numericals.com/file[1-100].txt
        ftp://ftp.numericals.com/file[001-100].txt    (with leading zeros)
        ftp://ftp.letters.com/file[a-z].txt

       No nesting of the sequences is supported at the moment, but you can use
       several ones next to each other:

        http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html

       You  can  specify  any amount of URLs on the command line. They will be
       fetched in a sequential manner in the specified order.

       Since curl 7.15.1 you can also specify step counter for the ranges,  so
       that you can get every Nth number or letter:

        http://www.numericals.com/file[1-100:10].txt
        http://www.letters.com/file[a-z:2].txt

당신은 그것이 "주요 0과 함께"라고 말한 것을 알았을 것입니다!

wget URL에서 에코 타입 시퀀스를 사용하여 일련의 숫자를 다운로드 할 수 있습니다 ...

wget http://someaddress.com/logs/dbsclog01s00{1..3}.log

이것은 또한 글자와 함께 작동합니다

{a..z} {A..Z}

조합을 사용할 수 있습니다 루프 in과 함께 printf 명령 (물론 수정 echo 에게 wget 필요에 따라):

$ for i in {1..10}; do echo "http://www.com/myurl`printf "%03d" $i`.html"; done
http://www.com/myurl001.html
http://www.com/myurl002.html
http://www.com/myurl003.html
http://www.com/myurl004.html
http://www.com/myurl005.html
http://www.com/myurl006.html
http://www.com/myurl007.html
http://www.com/myurl008.html
http://www.com/myurl009.html
http://www.com/myurl010.html

어떤 문제가 겪었는지 정확히 알지 못하지만 Bash의 루프가 간단하게 들리면 그렇게 할 것입니다.

for i in {1..999}; do
wget -k http://someaddress.com/logs/dbsclog01s$i.log -O your_local_output_dir_$i;
done

흥미로운 과제, 그래서 나는 당신을 위해 전체 대본을 썼습니다 (몇 가지 답변 등을 결합했습니다). 여기있어:

#!/bin/bash
# fixed vars
URL=http://domain.com/logs/     # URL address 'till logfile name
PREF=logprefix                  # logfile prefix (before number)
POSTF=.log                      # logfile suffix (after number)
DIGITS=3                        # how many digits logfile's number have
DLDIR=~/Downloads               # download directory
TOUT=5                          # timeout for quit
# code
for((i=1;i<10**$DIGITS;++i))
do
        file=$PREF`printf "%0${DIGITS}d" $i`$POSTF   # local file name
        dl=$URL$file                                 # full URL to download    
        echo "$dl -> $DLDIR/$file"                   # monitoring, can be commented
        wget -T $TOUT -q $dl -O $file
        if [ "$?" -ne 0 ]                            # test if we finished
        then
                exit
        fi
done

스크립트를 기지 할 때 URL, 로그 파일 접두사 및 접미사를 설정할 수 있습니다. 루프는 찾은 모든 로그 파일을 다운로드하고 첫 번째 비 존재 (WGET의 시간 초과 사용)에서 자동 종료됩니다.

이 스크립트는 예제에서 언급했듯이 로그 파일 인덱싱이 0이 아닌 1으로 시작한다고 가정합니다.

도움이 되었기를 바랍니다.

여기에서 원하는 것처럼 보이는 Perl 스크립트를 찾을 수 있습니다.

http://osix.net/modules/article/?id=677

#!/usr/bin/perl
$program="wget"; #change this to proz if you have it ;-)
my $count=1; #the lesson number starts from 1
my $base_url= "http://www.und.nodak.edu/org/crypto/crypto/lanaki.crypt.class/lessons/lesson";
my $format=".zip"; #the format of the file to download
my $max=24; #the total number of files to download
my $url;

for($count=1;$count<=$max;$count++) {
    if($count<10) {
    $url=$base_url."0".$count.$format; #insert a '0' and form the URL
    }
    else {
    $url=$base_url.$count.$format; #no need to insert a zero
    }
    system("$program $url");
}

방금 '글로브'에 대한 WGET 맨 페이지 토론을 살펴 보았습니다.

기본적으로 URL에 Globbing 캐릭터가 포함 된 경우 Globbing이 켜집니다. 이 옵션은 글로브를 영구적으로 켜거나 끄는 데 사용될 수 있습니다. 쉘에 의해 확장되지 않도록 URL을 인용해야 할 수도 있습니다. Globbing은 WGET가 시스템 별 디렉토리 목록을 찾습니다. 이것이 현재 Unix FTP 서버에서만 작동하는 이유입니다. (및 UNIX "LS"출력을 에뮬레이션하는 것).

따라서 wget http : // ... Globbing과 함께 작동하지 않습니다.

시스템이 SEQ가 있는지 확인하면 쉽습니다.

for i in $(seq -f "%03g" 1 10); do wget "http://.../dbsclog${i}.log"; done

시스템이 SEQ 대신 JOT 명령이있는 경우

for i in $(jot -w "http://.../dbsclog%03d.log" 10); do wget $i; done

오! 이것은 만화 다운로드를 자동화하기 위해 Bash를 학습 할 때와 비슷한 문제입니다.

이와 같은 것이 효과가 있어야합니다.

for a in `seq 1 999`; do
if [ ${#a} -eq 1 ]; then
    b="00"
elif [ ${#a} -eq 2 ]; then
    b="0"
fi
echo "$a of 231"
wget -q http://site.com/path/fileprefix$b$a.jpg

완료

파티에 늦었지만 코딩이 필요없는 실질적인 솔루션은 Downthemall Firefox 애드온을 사용하는 것이며, 이는 파일 범위를 검색하는 기능이 있습니다. 800 개의 연속 번호가 매겨진 파일을 다운로드해야 할 때의 솔루션이었습니다.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow