How can I prevent Microsoft ACE and JET in VB6 duplicating data from the first row in Excel spreadsheet?

StackOverflow https://stackoverflow.com/questions/7242537

Question

I'm dealing with a legacy application written in VB6 which reads in Excel spreadsheets and inserts them into a database.
It works for the most part, but if the data in the spreadsheet does not start on the first row, the first data row gets duplicated.
Say for example the first 3 rows in the spreadsheet are blank, the top four rows of data look like the following:

_|  A  |  B  |  C  |  D  |  E  |  F  |  G  |
1|     |     |     |     |     |     |     |
2|     |     |     |     |     |     |     |
3|     |     |     |     |     |     |     |
4|  99 |Text1|Text2|Text3|Text4|Text5|  77 |

The application connects to the Excel spreadsheet and reads it in using the following code:

Public Function obtainConnectionExcel(sql_conn, uid) As Variant
Dim cn As Object

Set cn = CreateObject("ADODB.Connection")
On Error Resume Next

cn.Provider = "Microsoft.ACE.OLEDB.12.0"
cn.Properties("Extended Properties").Value = "Excel 12.0;ReadOnly=True;HDR=No;IMEX=1"

If (Err <> 0) Then
    cn.Provider = "Microsoft.Jet.OLEDB.4.0"
    cn.Properties("Extended Properties").Value = "Excel 8.0;ReadOnly=True;HDR=No;IMEX=1"
End If

On Error Resume Next
cn.open getSpreadsheetPath(sql_conn, uid)
Set obtainConnectionExcel = cn
Exit Function
End Function
.....
Public Function extractAllData(parameters) As String

..... 'Variable declarations etc
On Error Resume Next
Set dbo_conn = obtainConnectionExcel(sql_conn, uid)
If Err <> 0 Then
    ....'logs error, goes to error handler
End If
On Error GoTo ErrorHandler

If (dbo_conn.State = 1) Then
    rownumber = 1

    Do While rownumber <= numberOfRowsToGet

        For x = lettercount To lettercount + lettercount_offset

            letter = Chr(x)
            sSql = "SELECT * FROM [" & worksheet & "$" & letter & rownumber & ":" & letter & rownumber & "]"
            On Error Resume Next
            Set rs = dbo_conn.execute(sSql)

            If (Not rs.EOF) Then
                'inserts the data into the db
            End If

        Next x

        rownumber = rownumber + 1
    Loop

    .... 'Post processing

Exit Function

....'Error handlers

End Function

That should be the relevant code. The issue occurs at the lines:

sSql = "SELECT * FROM [" & worksheet & "$" & letter & rownumber & ":" & letter & rownumber & "]"
        On Error Resume Next
        Set rs = dbo_conn.execute(sSql)

When the data is read in, regardless of whether we are using JET or ACE, the data is returned this way:

_|  A  |  B  |  C  |  D  |  E  |  F  |  G  |
1|  99 |     |     |     |     |     |  77 |
2|  99 |     |     |     |     |     |  77 |
3|  99 |Text1|Text2|Text3|Text4|Text5|  77 |
4|  99 |Text1|Text2|Text3|Text4|Text5|  77 |

I have tried connecting to the spreadsheet and getting the data in a multitude of ways, but nothing seems to work - either the connection would fail, or the data would be nothing but nulls.
I have found some work-arounds - for example, if I enter a space character into cell A1, the issue no longer occurs. However, I would like a program-based solution, rather than having to tell users to do extra steps to avoid this.
It only copies the first row of data. If the data in the cell is a number, then it copies the data into every cell above it in that column, if it is text then it only goes up by one level.
An interesting note is that, if I alter the spreadsheet to say that all of the data is text, it then copies every cell as if they are numbers (i.e. into every single cell above, rather than one single row)

All in all this is rather irritating - since I don't have any luck when searching for this issue, I can only conclude that we are doing something wrong, or very few people have bothered with this type of test data.

[edit] After some investigation I have made some headway in solving this - "The provider assumes that your table of data begins with the upper-most, left-most, non-blank cell on the specified worksheet" (http://support.microsoft.com/default.aspx?scid=kb;en-us;257819 ). This is confirmed if I use a statement to select the entire worksheet - it only returns the block of data.
So when I select any cell that is outside of that range, the provider, rather than doing something sensible like returning null, returns the data from the upper-most non-empty cell from that particular column.
I could hypothetically change the system so that it simply grabs all of the data and assumes that the upper-most left-most cell is cell A1, but that would break compatibility with data that is already out there.
What I now need is a way to get the cell-references for the data that is returned, so I can treat it appropriately, or a way to force this to no longer happen.

Was it helpful?

Solution 2

To answer my own question: It looks like you can't. You can, however, try to code around the issue.

The JET and ACE providers both take the upper-most left-most non-empty cell as being the start of the dataset (http://support.microsoft.com/default.aspx?scid=kb;en-us;257819 ) So, when you try to get the value from a cell that occurs before the start of that dataset, rather than doing something sensible and returning null, the providers return a guess based upon the top row of data.

I could find no way to get the cell-references of the returned dataset from the provider - it labelled everything as F1, F2 etc. ("Field 1", "Field 2")

So there were two remaining solutions:

1) Grab the entire dataset at once, assume the data starts at A1, and insert it into the DB using that assumption. This unfortunately would result in breaking compatability with pre-existing data.

2) Programmatically figure out the cell references, and input the data correctly. I did this using the following abbreviated code

sSql = "SELECT * FROM [" & worksheet & "$]"
Set rs = dbo_conn.execute(sSql)

rownumber = 1
If Not rs.EOF Then
    oledata_array = rs.GetRows()
Else
    ReDim oledata_array(0, 0)
End If

Do While rownumber <= numberOfRowsToGet
    col_number = 1
    For x = lettercount To lettercount + lettercount_offset

        letter = Chr(x)
        sSql = "SELECT * FROM [" & worksheet & "$" & letter & rownumber & ":" & letter & rownumber & "]"
        On Error Resume Next
        Set rs = dbo_conn.execute(sSql)
        If Not rs.EOF Then
            If rs(0) <> "" Then
                If x < furthest_left Then
                    furthest_left = x
                End If
                If x > furthest_right Then
                    furthest_right = x
                End If
                If rownumber > bottom_of_set Then
                    bottom_of_set = rownumber
                End If
            Else
            End If
        End If

        col_number = col_number + 1

    Next x

    rs.MoveNext
    rownumber = rownumber + 1
Loop
rs.Close

top_of_set = bottom_of_set - UBound(oledata_array, 2)

If CLng(UBound(oledata_array, 1)) <> CLng(furthest_right - furthest_left) Then
    'log the fact that there is a discrepency, and continue
End If

'now have the co-ords of the "square" of data as it occurs in the spreadsheet

rownumber = 1
row_index = 0
Do While rownumber <= numberOfRowsToGet
    col_number = 1
    For x = lettercount To lettercount + lettercount_offset

        letter = Chr(x)

        'construct the first chunk of the sql insert string

        If (x <= furthest_right) And (x >= furthest_left) And (rownumber <= bottom_of_set) And (rownumber >= top_of_set) Then
            sSql = sSql & "'" & oledata_array(col_number - 1, row_index) & "'"
            col_number = col_number + 1
        Else
            sSql = sSql & "''"
        End If

        'finish the sql string and execute

    Next x

OTHER TIPS

A popular blog post and thread (originally by OneDayWhen) on a problem similar to yours makes mention of a registry tweak which changes the way in which Excel guesses the data type of a cell.

I believe this 'guessing' behaviour may be the root of your problem.

External Data - Mixed Data Types

In summary, use TypeGuessRows to get Jet to detect whether a ‘mixed types’ situation exists or use it to ‘trick’ Jet into detecting a certaint data type as being the majority type. In the event of a ‘mixed types’ situation being detected, use ImportMixedTypes to tell Jet to either use the majority type or coerce all values as ‘Text’ (max 255 characters).

Others talk of amending your connection string to include MaxScanRows=0 but this does not appear to solve the problem. I understand you might be looking for a portable solution but I am unable to find such a workaround.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top