Question

The following script is a study from my work with a script that I use in order to scrape web pages with links to an unknown number of sequentially numbered files. In order to create as few useless requests as possible, the scraping script guesses the number of files in a "binary" way.

This "study" script optionally takes a guess as $1 as a point of departure in order to find the number of .txt files in a directory. As I was writing it, i had the feeling of reinventing the wheel.

What is this way of counting called? How could it be done less clumsily in Bash?

#!/bin/bash

set_SEARCH()
{
    DIVIDED=$SEARCH
    SEARCH=$(($SEARCH * 2))
}

increase_SEARCH()
{
    SEARCH=$(($SEARCH + $DIVIDED))
    DIVIDED=$(($DIVIDED / 2))
    #echo $DIVIDED
}

decrease_SEARCH()
{
    SEARCH=$(($SEARCH - $DIVIDED))
    DIVIDED=$(($DIVIDED / 2))
    #echo $DIVIDED
}

test_SEARCH()
{
    while [ -f "$(($SEARCH * 2)).txt" ]
    do
        set_SEARCH
    done
    if [ -f "${SEARCH}.txt" ]
    then
        echo "Trying $SEARCH"
        if [ "$DIVIDED" = 0 ]
        then
            NUMBER=$(($SEARCH + 0))
            echo "The number is $NUMBER"
            exit
        fi
        increase_SEARCH
        test_SEARCH
    else
        echo "Trying $SEARCH"
        if [ "$DIVIDED" = 0 ]
        then
            NUMBER=$(($SEARCH - 1))
            echo "The number is $NUMBER"
            exit
        fi
        decrease_SEARCH
        test_SEARCH
    fi 
}

SEARCH=2
while   [[ "$SEARCH" -le "$1" ]] ;
do
    SEARCH=$(($SEARCH * 2))
done
DIVIDED=$(($SEARCH / 2))

test_SEARCH
Was it helpful?

Solution

This is binary search with unknown upper bound. Knowing this, we can easily write/adapt implementations:

upper=42  # an arbitrary number >= 1
lower=0
while [[ -f $upper.txt ]]
do
    lower=$upper
    (( upper*= 2 ))
done

while (( lower < upper-1 ))
do
    middle=$(((lower+upper)/2))
    if [[ -f $middle.txt ]]
    then
        lower=$middle
    else
        upper=$middle
    fi
done

echo "The number is $lower"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top