Question

Setup Environment:

I'm developing an Excel 2010 Application Level Add-in using vb.net with .NET framework 4.


My goal:

  1. Have user type in multiple names to search for
  2. Use the list of names to execute a SQL query on a LARGE spreadsheet (30,000+ rows)
  3. Return the recordset and paste into new worksheet

Performance is my priority. I'd like to know the fastest way to do this by leveraging the .NET framework.

Using an ADO Connection Object in my code works, but the process takes too long (5 - 8 seconds).


This is the SQL query I'm using on the table named wells:

    SELECT * 
    FROM wells 
    WHERE padgroup in 

    (SELECT padgroup 
     FROM wells 
     WHERE name LIKE 'TOMCHUCK 21-30'
             OR name LIKE 'FEDERAL 41-25PH')


Here's a portion of what the table looks like:

Excel Table


I'm using this code right now to create an ADO Connection Object to retrieve my results:

    'Create Recordset Object
    rsCon = CreateObject("ADODB.Connection")
    rsData = CreateObject("ADODB.Recordset")

    rsCon.Open(szConnect)
    rsData.Open(mySQLQueryToExecute, rsCon, 0, 1, 1)

    'Check to make sure data is received, then copy the data
    If Not rsData.EOF Then

        TargetRange.Cells(1, 1).CopyFromRecordset(rsData)

    Else

        MsgBox("No records returned from : " & SourceFile, vbCritical)

    End If

    'Clean up the Recordset object
    rsData.Close()
    rsData = Nothing
    rsCon.Close()
    rsCon = Nothing


Based on what I know, Excel spreadsheets are stored in the Open XML format and the .NET framework includes native support for parsing XML.

After researching it, I came across a few different options:


Could somebody provide a pointer on what would be the best method to use? I'd really appreciate it.


Additional Notes:

  • All queries need to be able to perform without connectivity to an online database
  • I only need access to the spreadsheet once to extract the raw data from rows


Right now I just embed the spreadsheet as a project resource.

Then, at run-time I create the file, run the query, store the results in-memory, and delete the file.

   'Create temp file path in the commonapplicationdata folder
    Dim excelsheetpath As StringBuilder

    excelsheetpath = New StringBuilder(Environment.GetFolderPath(Environment.SpecialFolder.CommonApplicationData))

    excelsheetpath.Append("\MasterList.xlsm")

    'Save resources into temp location in HD
    System.IO.File.WriteAllBytes(excelsheetpath.ToString, My.Resources.MasterList)

    'Now call the function to use ADO to get records from the MasterList.xlsm file just created
    GetData(excelsheetpath.ToString, "Sheet1", "A1:S40000", True, False)

    'Store the results in-memory and display by adding to a datagridview control (in a custom task pane)

    'Delete the spreadsheet
    System.IO.File.Delete(excelsheetpath.ToString())
Was it helpful?

Solution 3

My Solution:

I tried three different approaches:

  • ADO Connection Object with SQL (slowest)
  • VSTO and Excel's Autofilter (reliable)
  • LINQ to XML (fastest)

LINQ to XML offered the best performance. I converted my table into an XML file:

XML Table


Then, in my code, I used StringReader to bring in the XMLwellData file (which is saved as a project resource).

    'welldoc will be the file to do queries on using LINQ to XML
    Dim stream As System.IO.StringReader
    stream = New StringReader(My.Resources.XMLwellData)

    welldoc = XDocument.Load(stream)

    'clean up stream now that it's no longer needed
    stream.Close()
    stream.Dispose()


    '***** later in the code perform my query on XML file *********

        Dim query = _
     From well In welldoc.<wellList>.<well> _
     Where well.<name>.Value Like "TOMCHUCK 21-30" _
     Select well

     For Each well in query

        MessageBox.Show(well.<padgroup>.value)

     Next

This was amazingly simple to do what I wanted and best of all it was FAST.

Thank you for all the help and suggestions. It made all the difference for me figuring this out.


Alternative Method using Excel's Autofilter


If you're trying to use the code suggested in the other answer, this will only filter on two values:

    worksheet.Cells.AutoFilter(column, criteria, Excel.XlAutoFilterOperator.xlOr);


So, to filter with multiple criteria using Excel's Auotfilter, you must pass your arguments as an array and filter on xlFilterValues.


    Dim wrkbk As Excel.Workbook
    Dim wrksht As Excel.Worksheet
    Dim myRange As Excel.Range
    Dim cell As Excel.Range

    'You would add all of your wellnames to search to this List
    Dim wellNames As New List(Of String)

    wrksht = wrkbk.Sheets(1)

    'In my excel file there is a Named Range which includes the all the information
    myRange = wrksht.Range("WellLookUpTable")

    'Notice, for `Criteria1:=` you MUST convert the List to an array
    With wrksht.Range("WellLookUpTable")
        .AutoFilter(Field:=2, Criteria1:=wellNames.ToArray, Operator:=Excel.XlAutoFilterOperator.xlFilterValues)
    End With

    myRange = wrksht.Range("A2", wrksht.Range("A2").End(Excel.XlDirection.xlDown)).Cells.SpecialCells(Excel.XlCellType.xlCellTypeVisible)

    For Each cell In myRange

        'column 11 is padgroup
        MessageBox.Show(cell.Offset(0, 11).Value)

    Next

OTHER TIPS

You are doing VSTO the wrong way ;) Don't use SQL with Excel. If you have the need for speed, exploit VSTO and the native Excel APIs. You can skip the overhead of the ADODB/OLEDB layers and go straight to the Excel object model to use the blazing fast Autofilter in Excel, the SpecialCells method to get only visible cells into a multi-area range, and the Value method to quickly copy a range to an array.

Here is a sample VSTO 2010 customized workbook that quickly searches against a list of 58k words for words containing "aba", "cat", or "zon".

using System;
using System.Collections.Generic;
using System.Data;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Xml.Linq;
using Microsoft.Office.Tools.Excel;
using Microsoft.VisualStudio.Tools.Applications.Runtime;
using Excel = Microsoft.Office.Interop.Excel;
using Office = Microsoft.Office.Core;

namespace ExcelWorkbook1
{
    public partial class ThisWorkbook
    {
        private void ThisWorkbook_Startup(object sender, System.EventArgs e)
        {
            const int Sheet1 = 1; // you can use Linq to find a sheet by name if needed
            const int ColumnB = 2;
            List<List<object>> results = Query(Sheet1, ColumnB, "aba", "cat", "zon");

            foreach (List<object> record in results)
            {
                System.Diagnostics.Debug.Print("{0,-10} {1,30} {2}", record[0], record[1], record[2]);
            }
        }

        private void ThisWorkbook_Shutdown(object sender, System.EventArgs e)
        {
        }

        /// <summary>
        ///     Removes any existing Excel autofilters from the worksheet
        /// </summary>
        private void ClearFilter(Microsoft.Office.Interop.Excel._Worksheet worksheet)
        {
            if (worksheet.AutoFilter != null)
            {
                worksheet.Cells.AutoFilter();
            }
        }

        /// <summary>
        ///     Applies an Excel Autofilter to the worksheet for search for an array of substring predicates
        /// </summary>
        private void ApplyFilter(Microsoft.Office.Interop.Excel._Worksheet worksheet, int column, params string[] predicates)
        {
            string[] criteria = new string[predicates.Length];
            int i = 0;

            ClearFilter(worksheet);

            foreach (string value in predicates)
            {
                criteria[i++] = String.Concat("=*", value, "*");
            }

            worksheet.Cells.AutoFilter(column, criteria, Excel.XlAutoFilterOperator.xlOr); 
        }

        /// <summary>
        ///     Returns a list of rows that are hits on a search for an array of substrings in Column B of Sheet1
        /// </summary>
        private List<List<object>> Query(int sheetIndex, int columnIndex, params string[] words)
        {
            Microsoft.Office.Interop.Excel._Worksheet worksheet;
            Excel.Range range;
            List<List<object>> records = new List<List<object>>();
            List<object> record;
            object[,] cells;
            object value;
            int row, column, rows, columns;
            bool hit;

            try
            {
                worksheet = (Microsoft.Office.Interop.Excel._Worksheet)Globals.ThisWorkbook.Sheets[sheetIndex];
                if (null == worksheet)
                {
                    return null;
                }

                // apply the autofilter
                ApplyFilter(worksheet, columnIndex, words);

                // get the 
                range = worksheet.Range["$A:$C"].SpecialCells(Excel.XlCellType.xlCellTypeVisible);
                foreach (Excel.Range subrange in range.Areas)
                {
                    // copy the cells to a multidimensional array for perfomance
                    cells = subrange.Value;

                    // transform the multidimensional array to a List
                    for (row = cells.GetLowerBound(0), rows = cells.GetUpperBound(0); row <= rows; row++)
                    {
                        record = new List<object>();
                        hit = false;

                        for (column = cells.GetLowerBound(1), columns = cells.GetUpperBound(1); column <= columns; column++)
                        {
                            value = cells[row, column];
                            hit = hit || (null != value);

                            if (hit)
                            {
                                record.Add(cells[row, column]);
                            }
                        }

                        if (hit)
                        {
                            records.Add(record);
                        }
                    }
                }
            }
            catch { }
            finally
            {
                // use GC.Collect over Marshal.ReleaseComObject() to release all RCWs per http://stackoverflow.com/a/17131389/1995977 and more
                cells = null;
                GC.Collect();
                GC.WaitForPendingFinalizers();
            }

            return records;
        }

        #region VSTO Designer generated code

        /// <summary>
        /// Required method for Designer support - do not modify
        /// the contents of this method with the code editor.
        /// </summary>
        private void InternalStartup()
        {
            this.Startup += new System.EventHandler(ThisWorkbook_Startup);
            this.Shutdown += new System.EventHandler(ThisWorkbook_Shutdown);
        }

        #endregion

    }
}

Excel 2010 files are not quite XML. Take an XLSX (or XMSM) file, and rename it with the .zip extension. Then extract it to a new folder. The files in the sub-folders will be XML files, yes, but the actual XLSX file is a zip file containing a collection of folders containing XML files.

Your best bet, I think, is to use the ACE drivers (JET is not supported any more) and access it as via ODBC. If this is not fast enough, you may be able to take extracts at certain times and upload them to a Database which you can run your queries from; the queries should be faster, but will be potentially out of date.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top