Question

I need to write an application or Query to export a large amount of PDF's to file and create a delimited text file that shows where the files are and contains the ID of the record.

what I was thinking about doing was using a console application that would write the entry to the text file after exporting the PDF from the DB, that way I would have all the information together when writing the Text File so that I could make sure that all the Data in the Delimited text file was accurate.

at first I was thinking about using a Dataset to do this, but there are going to be more than 50,000 rows of Data. I am not so sure that a DataTable would be any better

I was also looking into using the BCP utility, but from what I was reading the export doesn't give me a PDF File back from the Data, is this true?

I would probably consider myself a beginner, programming something like this. what should I use for a Data Structure like this? would I use a Cursor, and if so how would I set it up to fit what I am doing?

Update

I am going to try and use the DataSet option, but limit it to 1 days worth of data at a time using a do while loop, so that I can loop through every day from the beginning of the data until today's date. so I will do a days worth of Data and then get rid of the DataSet, then do the next date.

does anyone see anything in my Logic that would cause issues?

Was it helpful?

Solution

when I finally finished researching all the different approaches to the problem, it was really quite simple to code. I did not use BCP at all.

I created Variables for the information that I was extracting that I wanted inside the Text File.

  1. Filename
  2. Date (From SQL Table for the Original Creation Date)
  3. Case Number (internal Identifier for the 3rd party program to link to)
  4. Description (Taken from the SQL Table to describe the document)

Then I put the application to work Writing the Code to PDF one at a time

using (SqlConnection Conn = new SqlConnection(strSQLConn))
        {
            //open the connection
            Conn.Open();
            Console.WriteLine("the connection is open");

            //Variables needed for looping
            DateTime Today = System.DateTime.Now;
            DateTime StartDate = Convert.ToDateTime("2008-06-11 00:00:00");
            //DateTime StartDate = Today.AddDays(-10);
            Console.WriteLine("Converting the Documents from " + StartDate.ToString() + " - TO - " + Today.ToString());
            Console.WriteLine("Press Any Key to continue.");
            Console.ReadLine();
            int RecordCount = 0;
            ulong ByteCount = 0;
            int i = 1;
            foreach (DateTime day in EachDay(StartDate, Today))
            {
                String strDay = day.ToString();
                // Create a SQLCommand to retrieve Data
                SqlCommand getRecords = new SqlCommand("spRecapturePDF", Conn);
                getRecords.CommandType = CommandType.StoredProcedure;
                getRecords.Parameters.Add(new SqlParameter("@OneDay", strDay));
                SqlDataReader reader = getRecords.ExecuteReader();
                //stuff exporting the binary code to the PDF format
                FileStream fs;
                BinaryWriter bw;
                int buffersize = 100;
                byte[] outbyte = new byte[buffersize];
                long retval;
                long startIndex = 0;

                int j = 1;

                while (reader.Read())
                {
                    strFileName = reader.GetString(0) + "-" + i + "-" + j;
                    strDock_no = reader.GetString(0);
                    dtFiledate = reader.GetDateTime(2);
                    strDescription = reader.GetString(4);   
                    fs = new FileStream("c:\\FolderName\\" + strFileName + ".pdf", FileMode.OpenOrCreate, FileAccess.Write);
                    bw = new BinaryWriter(fs);
                    startIndex = 0;
                    retval = reader.GetBytes(1,startIndex,outbyte,0,buffersize);
                    while (retval == buffersize)
                    {
                        bw.Write(outbyte);
                        bw.Flush();
                        startIndex += buffersize;
                        retval = reader.GetBytes(1,startIndex,outbyte,0,buffersize);
                    }
                    //write the remaining buffer.
                    bw.Write(outbyte,0,(int)retval);
                    ByteCount = ByteCount + Convert.ToUInt64(fs.Length);
                    bw.Flush();
                    //close the output file
                    bw.Close();
                    fs.Close();
                    //need to write to the Text file here.
                    TextWriter tw = new StreamWriter(path,true);
                    tw.WriteLine(strDock_no + "~" + dtFiledate.ToString() + "~" + "c:\\FolderName\\" + strFileName + ".pdf" + "~" + strDescription);
                    tw.Close();
                    // increment the J variable for the Next FileName
                    j++;
                    RecordCount++;
                }
             //close the reader and the connection
                reader.Close();
                i++;
            }
        Console.WriteLine("Number of Records Processed:  " + RecordCount.ToString());
            Console.WriteLine("for a Total of : " + ByteCount + " Bytes");

            Decimal MByteCount = new Decimal(2);
            MByteCount = Convert.ToDecimal(ByteCount) / 1024 / 1024;
            Decimal GByteCount = new Decimal(2);
            GByteCount = MByteCount / 1024;

            Console.WriteLine("Total MBs : " + MByteCount.ToString() + " MB");
            Console.WriteLine("Total GBs : " + GByteCount.ToString() + " GB");
            Console.WriteLine("Press Enter to Continue ...");
            Console.ReadLine();
        }

this Code was enclosed in a foreach statement that went day by day, from a starting date to an end date. inside that foreach statement the Application called a stored procedure that was given the specified day to call the records that were entered that day.

variables i and j were created because I needed to have a unique Filename even if I had the same Case Number. i represented the day (because I went day by day in my select statement) and j represented the record number for that day from the select statement.

the foreach and the while loops were enclosed in a using(conn) so that no matter what the connection would be closed finally.

at the end of the while loop I wrote to the Text File. the Text file was created outside of all the loops so that I could just append the file rather than overwrite it. that code is:

string path = @"c:\\FolderName\\TextFile.txt";
        if (!File.Exists(path))
        {
            TextWriter tw = new StreamWriter(path, false);
            tw.WriteLine("Dock_No~Date~FileName(Location)~Description");
            tw.Close();
        }

I hope that this helps someone else. I left out all the Console.Writeline and Console.ReadLine code that wasn't necessary to the functionality i was looking for. I had added some code also that would count the bytes that were written and some code to count the records processed. this is just fun stuff to know, I need to clean up the Fun stuff at the end.

these are the Guts of what it took to accomplish a mass Extract of PDFs from a Blob Field in SQL Server, minus some Connection Mumbo Jumbo

Foreach Day Set up

this is the Code that I used to make the foreach work the way that I wanted it to.

static public IEnumerable<DateTime> EachDay(DateTime Startdate, DateTime EndDate)
    {
        for (var day = Startdate.Date; day.Date <= EndDate.Date; day = day.AddDays(1))
            yield return day;
    }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top