I have an interesting question and I cannot find a reply.

I would like to create a database of prime numbers. On this page Prime numbers in a given range , I can find various examples of SQL code that can help me out reach the goal thanks to the Sieve of Eratosthenes.

So if I run this code:

CREATE PROC dbo.PrintPrimeNumbers
    @startnum int,
    @endnum int
AS 
WITH 
     t4 AS (SELECT n FROM (VALUES(0),(0),(0),(0)) t(n))
    ,t256 AS (SELECT 0 AS n FROM t4 AS a CROSS JOIN t4 AS b CROSS JOIN t4 AS c CROSS JOIN t4 AS d)
    ,t16M AS (SELECT ROW_NUMBER() OVER (ORDER BY (a.n)) AS num FROM t256 AS a CROSS JOIN t256 AS b CROSS JOIN t256 AS c)
SELECT num
FROM t16M AS Dividend
WHERE
    Dividend.num <= @endnum
    AND NOT EXISTS(
        SELECT 1
        FROM t16M AS Divisor
        WHERE
            Divisor.num <= @endnum
            AND Divisor.num BETWEEN 2 AND SQRT(Dividend.num)
            AND Dividend.num % Divisor.num = 0
            AND Dividend.num <= @endnum
    );
GO
EXEC dbo.PrintPrimeNumbers 1, 100;
GO

I can easily generate a list of all prime numbers from 0 to 100.

But let's say I want the query to run for the next 20 years, so I set as limit a very large number:

EXEC dbo.PrintPrimeNumbers 1, 1000000000000000000000000000000000000000000000000000000000...;

The query will start and will run... let's say for the next 20 years.

But the Sieve of Eratosthenes has a peculiarity:

If it gets interrupted you have to restart from the beginning.

So my questions start here:

  • How can I manage to change the CPU, upgrade RAM, change Hard Drive, etc...
  • How can I manage a failover disaster scenario in order to avoid that script to stop?
  • Will the failover guarantee that the stored procedure won't stop?
  • Cloud redundancy and failover to different providers (Azure / AWS / GCE) will this assure that the script won't stop?
  • What if the failover is from Azure to AWS in 2 different parts of the globe? Will this failover keep the stored procedure running?
  • Can I take a backup, let's say every month, of the status of that stored procedure, and eventually resume from that image?
  • If I run the stored procedure on a virtual machine and I take snapshots, can I resume the stored procedure?

I know for sure that someone is doing this: this online database of factorized prime numbers http://factordb.com/status.php has managed to increase from 200MB (in 2014) to nearly 800MB today (2019).

enter image description here

有帮助吗?

解决方案

1st thought

You have an XY Problem. You need to use an algorithm that allows you to restart from a specific point.

Code Review

  • Dividend.num <=@endnum exists in two places.
    • Get rid of the 2nd one
  • Divisor.num between 2 and sqrt(dividend.num) is more restrictive thanDivisor.num <= @endnum`.
    • get rid of the less restrictive one.
  • Since the NOT EXISTS uses the appropriate range of rows for all Dividend rows...
    • You can safely change Dividend.num <= @endnum to Dividend.num between @startnum and @endnum
  • Generate T16M once will allow you to reuse the results
    • That is: make is a regular table

Algortihm Notes

Now that the SQL statement defines a RANGE...you can run for 20 years

For all numbers between A and B:

  1. Chunk them up into smaller, manageable ranges.
  2. Run n Chunks in parallel

Unlike the Segmented Sieve method, you are using all values for each chunk, not just known Primes.

Segmented Sieve

Thanks to LowlyDBA for URL.

Required changes to implement Segmented Sieve

  • the key is to use the values in the table T16M, not a CTE.
  • Modify the code to DELETE the non-prime numbers in T16M. (eg DELETE FROM T16M WHERE num in ( .... ) )
  • Serially run each chunk in lowest to highest order.

Notes

The lifespan of the universe (and the required amount of disk space) will probably limit you to how big of a number you can find.

Original Questions Concerns

  1. Long Running System

At 40 years and still going, one of the longest running computer programs is that of Voyager 1/Voyager 2.

https://www.space.com/26041-nasa-voyager-probes-solar-system-legacy.html

Voyager employs three dual-redundant computer systems per spacecraft. https://history.nasa.gov/computers/Ch6-2.html

  1. adding CPU/RAM/Disks

Adding/replacing Disks can be done via SAN system.

Hot adding physical CPU/RAM may require non-Intel equipment.

CPU and memory resources can be non-disruptively added to the system.... https://en.wikipedia.org/wiki/IBM_Z

Chunking

Oracle users can use DMBS_PARALLEL_EXECUTE to create Chunks and run them. Other RDBMS will need to implement their own API that does that.

其他提示

It would be easier to mod the script so that it writes out either the last value, or a value every n times so that you can restart the script with the last known value in case of failure. Linux hardware is pretty reliable, I've had servers that have had over 4 year uptime but of course nothing is guaranteed.

Preliminary remark: There were some (now deleted) comments around the feasibility of changing to an approach, which allows to restart the on the Sieve of Eratosthenes based search for primes…

Something along the following should work:

Setting the Stage/Seed:

CREATE TABLE Prime (
    id INT NOT NULL IDENTITY PRIMARY KEY,
    prime INT NOT NULL UNIQUE
);
CREATE TABLE LastOne (
    id INT NOT NULL PRIMARY KEY
);
BEGIN TRANSACTION;
  INSERT INTO Prime (prime) VALUES (2);
  INSERT INTO LastOne VALUES (2);
COMMIT;

Be Ready To Do the Job Whenever Possible:

CREATE PROC dbo.SieveOfEratosthenes AS
DECLARE
    @nextPrimeCandidate INT = NULL
BEGIN
  SET @nextPrimeCandidate = (
    SELECT id + 1
    FROM LastOne AS Dividend
    WHERE
      (Dividend.id + 1) % 2 = 1
      AND
      NOT EXISTS (
        SELECT 1
        FROM Prime
        WHERE
          prime BETWEEN 2 AND SQRT(Dividend.id + 1)
          AND
          (Dividend.id + 1) % prime = 0
    )
  )
  ;
  BEGIN TRANSACTION
    IF (@nextPrimeCandidate IS NOT NULL)
    INSERT INTO Prime (prime) VALUES (@nextPrimeCandidate);
    UPDATE LastOne SET id = id + 1;
  COMMIT;
END;

See it in action: db<>fiddle.

N.B.: Just the idea - T-SQL is not really my cup of tea…

Please comment, if and as this requires adjustment / further detail.

许可以下: CC-BY-SA归因
不隶属于 dba.stackexchange
scroll top