Domanda

Apologies, this question is a little abstract and as such is a little hard to define so I will probably need to edit the question a couple of times to clarify:

I've got a configuration file that I need to parse where each relevant line contains one of the following formats:

FieldName = Value
FieldName(Index) = Value
FieldName(Index1, Index2) = Value
FieldName(Index1, Index2, ...IndexN) = Value

For example:

Field0 = 0
Field1(0, 0) = 0.01
Field1(0, 1) = 0.02
Field1(1, 0) = 0.03
Field1(1, 1) = 0.04
Field1(2, 0) = ADF0102BC5
Field1(2, 1) = ADF0102BC6
Field2(0, 0) = 0
Field2(0, 1) = 2
Field3(1) = 5
Field3(2) = 7
Field3(3) = 9
Field4(0, 0, 1) = 64.75
Field4(0, 1, 0) = 65.25
Field4(1, 0, 0) = 72.25

The relevant lines are simple enough to parse from the file using regular expressions and I've got that bit handled already. What I'm having a problem with is how to model the data in the database so that as a new index comes into scope for a field, it can automatically be added without requiring new columns to be added to the table.

The FieldName is always a Varchar of max length 50

The Value is always a numeric value represented in one of many string formats that need parsing individually and for the purpose of this question is largely irrelevant.

Each of the indexes (if a field has them) is an integer value. Each has a meaning in its own right, but are used together as a mapping for a bunch of values to a field name.

Each instance of a fieldname i.e. Field1 will have a constant number of indexes, i.e. you will never have Field1(0, 0) and Field1(0, 0, 0). If Field1 has 2 indexes in one line of the configuration file, then all instances of Field1 would have 2 indexes.

I need for the system to be flexible enough to parse the file and attach as many indexes as necessary for each field.

I'm in 2 minds - do I treat the entire left side of the "equation" as the label and as such Field1(0, 0) become the "FieldName", which makes querying by index quite difficult or do I model my data such that these indexes effectively become coordinates for fields value?

If the indexes remained constant across all files I could model this using:

Table Fields(
    FieldId Integer Identity(1, 1) Primary Key,
    FieldName VarChar(50)
)

Table FieldValues(
    FieldId Integer Constraint FK_FV_FID Foreign Key References Fields(FieldId)
    Index1 Integer
    Index2 Integer
    Index3 Integer
    Index4 Integer
    Value  Varchar(50)
)

Unfortunately, due to the unknown number of indexes until the file is parsed, it makes modeling that relationship more complex.

Once the data is stored, I then need to be able to simply query using either the fieldname to get a list of all corresponding index references with their values i.e.

Field1
------
0, 0 = 0.01
0, 1 = 0.02
1, 0 = 0.03
1, 1 = 0.04
2, 0 = ADF0102BC5
2, 1 = ADF0102BC6

Or

Field1 Where Index1 = 0
-----------------------
0, 0 = 0.01
0, 1 = 0.02

Or

Field1 Where Index 2 = 1
------------------------
0, 1 = 0.02
1, 1 = 0.04
2, 1 = ADF0102BC6

Or

Field1 Where Index1 = 0 And Index2 = 1
--------------------------------------
0, 1 = 0.02

If I've got a complicated table structure, it makes simplified querying a bit more of a pain in the neck.

È stato utile?

Soluzione

here is my thinking process on this situation, There will be major two different kinds of queries. One where results are not sliced by IndexPostion and/Or IndexValue. and Second Where Results are sliced by them.

And no single table design can give me that result w/o any trade off. Trade Off might be storage, performance, or query complexity.

Below solution is "let go Storage" but takes care of performance and query simplicity while accessing this schema.

For the First type of queries only table "SO_FieldIndexValue" will be used.

But for the second type of queries we need to join it with other two where we need the result filtered by IndexPosition/IndexPositionValue.

Schema Design

    IF OBJECT_ID('SO_FieldIndexPositionValue') IS NOT NULL 
        DROP TABLE SO_FieldIndexPositionValue
    IF OBJECT_ID('SO_FieldIndexValue') IS NOT NULL 
        DROP TABLE SO_FieldIndexValue
    IF OBJECT_ID('SO_IndexPositionValue') IS NOT NULL 
        DROP TABLE SO_IndexPositionValue

    CREATE TABLE SO_FieldIndexValue
        (
          FIV_ID        BIGINT NOT NULL IDENTITY
            CONSTRAINT XPK_SO_FieldIndexValue PRIMARY KEY NONCLUSTERED
          ,FieldName    NVARCHAR(50)NOT NULL
          ,FieldIndex   NVARCHAR(10) NOT NULL
          ,FieldValue   NVARCHAR(500) NULL
        )
    CREATE UNIQUE CLUSTERED INDEX CIDX_SO_FieldIndexValue
    ON SO_FieldIndexValue(FIV_ID ASC,FieldName ASC,FieldIndex ASC)
    CREATE NONCLUSTERED INDEX NCIDX_SO_FieldIndexValue
    ON SO_FieldIndexValue (FIV_ID,FieldName) 
    INCLUDE (FieldIndex,FieldValue)

    CREATE TABLE SO_IndexPositionValue
        (
            IPV_ID              BIGINT  NOT NULL IDENTITY
                CONSTRAINT XPK_SO_IndexPositionValue PRIMARY KEY NONCLUSTERED
            ,IndexName          SYSNAME NOT NULL
            ,IndexPosition      INT     NOT NULL
            ,IndexPositionValue BIGINT  NOT NULL
        )
    CREATE UNIQUE CLUSTERED INDEX CIDX_SO_IndexPositionValue 
    ON SO_IndexPositionValue(IPV_ID ASC,IndexPosition ASC, IndexPositionValue ASC)

    CREATE TABLE SO_FieldIndexPositionValue
        (
          FIPV_ID       BIGINT NOT NULL IDENTITY
                CONSTRAINT XPK_SO_FieldIndexPositionValue PRIMARY KEY NONCLUSTERED
          ,FIV_ID           BIGINT NOT NULL REFERENCES SO_FieldIndexValue (FIV_ID)
          ,IPV_ID       BIGINT NOT NULL REFERENCES SO_IndexPositionValue (IPV_ID)
        )
    CREATE CLUSTERED INDEX CIDX_SO_FieldIndexPositionValue 
    ON SO_FieldIndexPositionValue(FIPV_ID ASC,FIV_ID ASC,IPV_ID ASC)

I have provided a simple SQL API to just demonstrate the how insert into this schema can be handle easily using single API.

There is plenty of opportunity to play with this API and make customization as needed. for example add validation if input is in proper format.

    IF object_id('pr_FiledValueInsert','p') IS NOT NULL
        DROP PROCEDURE pr_FiledValueInsert
    GO
    CREATE PROCEDURE pr_FiledValueInsert
    (
        @FieldIndexValue    NVARCHAR(MAX)
        ,@FieldValue        NVARCHAR(MAX)=NULL
    )
    AS
    BEGIN
    SET NOCOUNT ON
    BEGIN TRY
    BEGIN TRAN
            DECLARE @OriginalFiledIndex NVARCHAR(MAX)=@FieldIndexValue
            DECLARE @FieldName              sysname=''
                    ,@FIV_ID                BIGINT
                    ,@FieldIndex            sysname
                    ,@IndexName             sysname
                    ,@IndexPosition         BIGINT
                    ,@IndexPositionValue    BIGINT
                    ,@IPV_ID                BIGINT
                    ,@FIPV_ID               BIGINT
                    ,@CharIndex1            BIGINT
                    ,@CharIndex2            BIGINT
                    ,@StrLen                BIGINT
                    ,@StartPos              BIGINT
                    ,@EndPos                BIGINT

            SET @CharIndex1 = CHARINDEX('(',@OriginalFiledIndex)
            SET @StrLen     = LEN(@OriginalFiledIndex)
            SET @CharIndex2 = CHARINDEX(')',@OriginalFiledIndex)
            SET @FieldName  = RTRIM(LTRIM(SUBSTRING(@OriginalFiledIndex,1,@CharIndex1-1)))
            SET @FieldIndex = RTRIM(LTRIM(SUBSTRING(@OriginalFiledIndex,@CharIndex1+1,@StrLen-@CharIndex1-1)))


            --Insert FieldIndexValue and Get @FIV_ID
            SELECT @FIV_ID = FIV_ID 
            FROM SO_FieldIndexValue 
            WHERE FieldName=@FieldName
            AND FieldIndex=@FieldIndex
            IF @FIV_ID IS NULL
            BEGIN
                INSERT INTO SO_FieldIndexValue ( FieldName,FieldIndex,FieldValue )
                SELECT @FieldName,@FieldIndex,@FieldValue
                SELECT @FIV_ID = SCOPE_IDENTITY()
            END
            ELSE
            BEGIN
                RAISERROR('Filed and Index Combination already Exists',16,1)
            END


            --Find the First IndexPosition and IndexPositionValue and Get @IPV_ID
            SELECT @StartPos=CHARINDEX('(',@OriginalFiledIndex,1)+1
            SELECT @EndPos = CASE   WHEN CHARINDEX(',',@OriginalFiledIndex,@StartPos)<>0
                                    THEN  CHARINDEX(',',@OriginalFiledIndex,@StartPos)- @StartPos
                                    ELSE CHARINDEX(')',@OriginalFiledIndex,@StartPos) - @StartPos
                                END
            SELECT @IndexPosition = 1
            SELECT @IndexPositionValue = SUBSTRING(@OriginalFiledIndex,@StartPos,@EndPos)
            SELECT @IndexName = 'Index'+CAST(@IndexPosition AS Sysname)

            --Insert IndexPositionvalue
            SELECT @IPV_ID = IPV_ID
            FROM SO_IndexPositionValue
            WHERE IndexPosition=@IndexPosition
            AND IndexPositionValue = @IndexPositionValue
            IF @IPV_ID IS NULL
            BEGIN
                INSERT SO_IndexPositionValue
                        ( IndexName ,
                          IndexPosition ,
                          IndexPositionValue
                        )
                SELECT @IndexName,@IndexPosition,@IndexPositionValue
                SET @IPV_ID = SCOPE_IDENTITY()          
            END

            --Insert the First FieldIndexPositionValue
            IF NOT EXISTS(
                            SELECT TOP(1) 1 
                            FROM SO_FieldIndexPositionValue
                            WHERE FIV_ID = @FIV_ID
                            AND IPV_ID = @IPV_ID
                        )
            BEGIN
                INSERT SO_FieldIndexPositionValue( FIV_ID, IPV_ID )
                SELECT @FIV_ID,@IPV_ID
            END

            --If More than One Index exist, process remining indexpositions
            WHILE @StrLen>@StartPos+@EndPos
            BEGIN           
                SET @StartPos = @StartPos+@EndPos+1
                SET @EndPos = CASE WHEN CHARINDEX(',',@OriginalFiledIndex,@StartPos)<>0
                                    THEN  CHARINDEX(',',@OriginalFiledIndex,@StartPos)- @StartPos
                                    ELSE CHARINDEX(')',@OriginalFiledIndex,@StartPos) - @StartPos
                                END

                SELECT @IndexPosition = @IndexPosition+1
                SELECT @IndexPositionValue = SUBSTRING(@OriginalFiledIndex,@StartPos,@EndPos)
                SELECT @IndexName = 'Index'+CAST(@IndexPosition AS Sysname)

                --Insert IndexPositionvalue
                SET @IPV_ID = NULL
                SELECT @IPV_ID = IPV_ID
                FROM SO_IndexPositionValue
                WHERE IndexPosition=@IndexPosition
                AND IndexPositionValue = @IndexPositionValue
                IF @IPV_ID IS NULL
                BEGIN
                    INSERT SO_IndexPositionValue
                            ( IndexName ,
                              IndexPosition ,
                              IndexPositionValue
                            )
                    SELECT @IndexName,@IndexPosition,@IndexPositionValue
                    SET @IPV_ID = SCOPE_IDENTITY()
                END

                --Insert FieldIndexPositionValue
                IF NOT EXISTS(
                                SELECT TOP(1) 1 
                                FROM SO_FieldIndexPositionValue
                                WHERE FIV_ID = @FIV_ID
                                AND IPV_ID = @IPV_ID
                            )
                BEGIN
                    INSERT SO_FieldIndexPositionValue( FIV_ID, IPV_ID )
                    SELECT @FIV_ID,@IPV_ID
                END
            END
    COMMIT TRAN
    END TRY
    BEGIN CATCH
        ROLLBACK TRAN
        SELECT ERROR_MESSAGE()
    END CATCH
    SET NOCOUNT OFF
    END
    GO

Now Sample Input Data

    EXECUTE pr_FiledValueInsert 'FIELD1(0,1,0)',101
    EXECUTE pr_FiledValueInsert 'FIELD1(0,1,2)','ABCDEF'
    EXECUTE pr_FiledValueInsert 'FIELD1(1,0,1)','hello1'

    EXECUTE pr_FiledValueInsert 'FIELD2(1,0,0)',102
    EXECUTE pr_FiledValueInsert 'FIELD2(1,1,0)','hey2'
    EXECUTE pr_FiledValueInsert 'FIELD2(1,0,1)','hello2'

Sample Query1

    SELECT FieldName,FieldIndex,FieldValue 
    FROM dbo.SO_FieldIndexValue
    WHERE FieldName = 'Field1'

Sample Result1

SampleResult1

Sample Query2

    SELECT FieldName,FieldIndex AS CompeleteIndex,IndexPosition,IndexPositionValue,FieldValue
    FROM SO_FieldIndexPositionValue fipv
    JOIN dbo.SO_IndexPositionValue ipv
        ON ipv.IPV_ID=fipv.IPV_ID
    JOIN dbo.SO_FieldIndexValue fiv
        ON fiv.FIV_ID=fipv.FIV_ID
    WHERE
    (IndexPosition=2 AND IndexPositionValue=1)
    AND FieldName = 'Field1'

Sample Result2

SampleResult2

Altri suggerimenti

not sure this is the only answer - but here is an idea:

field
-------
field_id
name

index
---------
index_id
field_id
position
value

field_value
------------
field_id
index_id
value

One thing my SQL experience has taught me - if you don't know how many of them there are, then they belong in rows rather than in columns.

I suggest two tables structured like this :

Row

Row_Id, Field_Name, Value

Index

Row_Id, Index_Position, Index_Value

To look up a parameter value by its indices, you would do multiple joins to the Index table e.g.

select r.Row_Id, r.Value from Row r
join Index i1 on r.Row_Id = i1.Row_Id
join Index i2 on r.Row_Id = i2.Row_Id
join Index i3 on r.Row_Id = i3.Row_Id
where
i1.Index_Position = 1 and i1.Index_Value = '3' and
i2.Index_Position = 2 and i2.Index_Value = '7' and
i3.Index_Position = 3 and i3.Index_Value = '42' and

EDIT : which basically comes down to conforming to first normal form. Having multiple pieces of information within one column (e.g. allowing your FieldName column to contain "FieldName(0,1)") violates this - which will lead to headaches later (as you noted - how to parse? how to compare rows with different number of entries? how to query?).

EDIT 2 : sample data for the first three rows of the config file listed in your question. Basically every row in the config file maps to an entry in the Row table. And every single index parameter maps to an entry in the Index table (with a link back to which row it came from) :

Row

Row_Id, Field_Name, Value

1, "Field0", "0"

2, "Field1", "0.01"

3, "Field1", "0.02"

Index

Row_Id, Index_Position, Index_Value

2, 1, 0

2, 2, 0

3, 1, 0

3, 2, 1

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top