Question

I have multiple tables in an Oracle schema that have columns with incorrect values:

  • Number columns where the value is zero, instead of null.
  • Text columns where the value is a space, instead of null.

I’m in the process of cleaning up the data. I will eventually create constraints on the columns to prevent bad data being entered in the first place. But before I do that, I need to find the columns that have existing problems and investigate them.

To start, I would like to produce a list of columns that have incorrect values.

  • For each table in a schema, provide a list of columns that have zeros or spaces as values.
table_name    column_name    datatype    value    count
—————————————————————————————————-————————————-———————-
roads         width          number      0        500
sidewalks     description    varchar2    [space]  10000
sidewalks     const_year     number      0        2000

Question:

What would it take to produce a list like this in Oracle?

(Note: I’m a public works technician by day, not a database administrator. Layman’s terms would be appreciated.)

Was it helpful?

Solution

You should look into querying the information schema based tables (such as ALL_TAB_COLUMNS or DBA_TAB_COLUMNS) to get a list of all tables and all columns: https://docs.oracle.com/en/database/oracle/oracle-database/18/refrn/ALL_TAB_COLUMNS.html#GUID-F218205C-7D76-4A83-8691-BFD2AD372B63

https://docs.oracle.com/en/database/oracle/oracle-database/18/refrn/DBA_TAB_COLUMNS.html#GUID-91F945AE-5778-45A6-A07E-775A006A9AA1

Then look into how to write dynamic SQL to easily generate the code you need to query every column of every table: https://docs.oracle.com/en/database/oracle/oracle-database/18/lnpls/dynamic-sql.html#GUID-7E2F596F-9CA3-4DC8-8333-0C117962DB73

Here is some psuedo-code on what your thought process should be when creating the dynamic SQL (*note this is not Oracle PL/SQL syntax, so just take it as a general example on how to build dynamic SQL, and focus mostly on step 3):

-- 1) Create a temp table to hold the results
DROP TABLE IF EXISTS #Results
CREATE TABLE #Results (TableName VARCHAR(100), ColumnName VARCHAR(100), [Value] VARCHAR(100) NULL)

-- 2) Dynamic SQL variable used to build the dynamic SQL string
DECLARE @DynamicSQL AS NVARCHAR(MAX) = ''

-- 3) Build the dynamic SQL string using the DBA_TAB_COLUMNS information schema view. First query gets any row whose column has 0 as a value, second query gets any row whose column has a NULL value
SELECT @DynamicSQL = @DynamicSQL + '
    INSERT INTO #Results 
    SELECT ''' + TABLE_NAME+ ''' AS TableName, ''' + COLUMN_NAME + ''' AS ColumnName, [' + COLUMN_NAME + '] AS ColumnValue 
    FROM [' + TABLE_NAME+ '] 
    WHERE [' + COLUMN_NAME + '] = ''0'';' 
    + '
    INSERT INTO #Results 
    SELECT ''' + TABLE_NAME + ''' AS TableName, ''' + COLUMN_NAME + ''' AS ColumnName, [' + COLUMN_NAME + '] AS ColumnValue 
    FROM [' + TABLE_NAME + '] 
    WHERE [' + COLUMN_NAME + '] IS NULL;'
FROM sys.DBA_TAB_COLUMNS

-- 4) Execute the dynamic SQL string
EXEC sp_ExecuteSQL @DynamicSQL

-- 5) Select the final results, grouping up on TableName, ColumnName, and Value to get the counts of 0 and NULL values per column per table
SELECT R.TableName, R.ColumnName, TC.DATA_TYPE AS DataType, R.[Value], COUNT(1) AS [Count]
FROM #Results AS R
INNER JOIN sys.DBA_TAB_COLUMNS AS TC
    ON R.TableName = TC.TableName
    AND R.ColumnName = R.ColumnName
GROUP BY R.TableName, R.ColumnName, R.[Value]

OTHER TIPS

Here's what I came up with:

--DROP TABLE incorrect_value_results;
--CREATE TABLE incorrect_value_results (id NUMBER, table_name VARCHAR2(30), column_name VARCHAR2(30), val_count NUMBER, value varchar2(30));
TRUNCATE TABLE incorrect_value_results;

DECLARE
   l_count NUMBER;
   l_inv_char_str VARCHAR2(2000);
   TYPE invalid_char_val_rec IS RECORD(
      cnt      NUMBER,
      inv_char VARCHAR2(20)
   );
   TYPE invalid_char_val_tab IS TABLE OF invalid_char_val_rec
      INDEX BY PLS_INTEGER;
   l_inv_char_vals invalid_char_val_tab;
   l_index NUMBER;
BEGIN
   l_index := 0;
   -- Loop through each table in the schema
   FOR i IN (SELECT table_name
               FROM user_tables) 
   LOOP
      -- Loop through each relevant column for this table
      -- Exclude column EVT_FROM
      FOR j IN (SELECT column_name, data_type
                  FROM user_tab_cols
                 WHERE table_name = i.table_name
                   AND column_name  NOT IN ('EVT_FROM','ANGLE','UND','OFFSET')
                   AND table_name   NOT LIKE 'A_%'
                   AND table_name   NOT LIKE 'SDE%'
                   AND table_name   NOT LIKE '%ANNO%'
                   AND table_name   NOT IN ('INCORRECT_VALUE_RESULTS','NUMBERS')
                   AND data_type    IN ('VARCHAR2', 'CHAR', 'NCHAR', 'NVARCHAR2', 'NUMBER')) 
      LOOP
         IF j.data_type IN ('VARCHAR2', 'CHAR', 'NCHAR', 'NVARCHAR2') THEN
            EXECUTE IMMEDIATE 
               'SELECT COUNT(1), '||j.column_name||
                ' FROM '||i.table_name|| 
                ' WHERE UPPER('||j.column_name||') IN('' '', ''  '', ''0'', ''-'', ''NULL'', ''<NULL>'' )
                  GROUP BY '||j.column_name
                 BULK COLLECT INTO l_inv_char_vals;
            
         ELSIF j.data_type = 'NUMBER' THEN
            EXECUTE IMMEDIATE 
               'SELECT COUNT(1), '||j.column_name||
                ' FROM '||i.table_name|| 
                ' WHERE UPPER('||j.column_name||') <= 0
                  GROUP BY '||j.column_name
                 BULK COLLECT INTO l_inv_char_vals;
         END IF;     
         -- If there are results then log them
         l_index := l_index + 1;
         FORALL k IN 1..l_inv_char_vals.COUNT
            INSERT INTO incorrect_value_results (id, table_name, column_name, val_count, value)
            VALUES (l_index,
                    i.table_name, 
                    j.column_name, 
                    l_inv_char_vals(k).cnt,
                    l_inv_char_vals(k).inv_char);
      END LOOP;
   END LOOP;
END;
/
COMMIT;
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top