Check for spaces or zeros in all columns in all tables
-
09-03-2021 - |
سؤال
I have multiple tables in an Oracle schema that have columns with incorrect values:
- Number columns where the value is zero, instead of null.
- Text columns where the value is a space, instead of null.
I’m in the process of cleaning up the data. I will eventually create constraints on the columns to prevent bad data being entered in the first place. But before I do that, I need to find the columns that have existing problems and investigate them.
To start, I would like to produce a list of columns that have incorrect values.
- For each table in a schema, provide a list of columns that have zeros or spaces as values.
table_name column_name datatype value count —————————————————————————————————-————————————-———————- roads width number 0 500 sidewalks description varchar2 [space] 10000 sidewalks const_year number 0 2000
Question:
What would it take to produce a list like this in Oracle?
(Note: I’m a public works technician by day, not a database administrator. Layman’s terms would be appreciated.)
المحلول
You should look into querying the information schema based tables (such as ALL_TAB_COLUMNS or DBA_TAB_COLUMNS) to get a list of all tables and all columns: https://docs.oracle.com/en/database/oracle/oracle-database/18/refrn/ALL_TAB_COLUMNS.html#GUID-F218205C-7D76-4A83-8691-BFD2AD372B63
Then look into how to write dynamic SQL to easily generate the code you need to query every column of every table: https://docs.oracle.com/en/database/oracle/oracle-database/18/lnpls/dynamic-sql.html#GUID-7E2F596F-9CA3-4DC8-8333-0C117962DB73
Here is some psuedo-code on what your thought process should be when creating the dynamic SQL (*note this is not Oracle PL/SQL syntax, so just take it as a general example on how to build dynamic SQL, and focus mostly on step 3):
-- 1) Create a temp table to hold the results
DROP TABLE IF EXISTS #Results
CREATE TABLE #Results (TableName VARCHAR(100), ColumnName VARCHAR(100), [Value] VARCHAR(100) NULL)
-- 2) Dynamic SQL variable used to build the dynamic SQL string
DECLARE @DynamicSQL AS NVARCHAR(MAX) = ''
-- 3) Build the dynamic SQL string using the DBA_TAB_COLUMNS information schema view. First query gets any row whose column has 0 as a value, second query gets any row whose column has a NULL value
SELECT @DynamicSQL = @DynamicSQL + '
INSERT INTO #Results
SELECT ''' + TABLE_NAME+ ''' AS TableName, ''' + COLUMN_NAME + ''' AS ColumnName, [' + COLUMN_NAME + '] AS ColumnValue
FROM [' + TABLE_NAME+ ']
WHERE [' + COLUMN_NAME + '] = ''0'';'
+ '
INSERT INTO #Results
SELECT ''' + TABLE_NAME + ''' AS TableName, ''' + COLUMN_NAME + ''' AS ColumnName, [' + COLUMN_NAME + '] AS ColumnValue
FROM [' + TABLE_NAME + ']
WHERE [' + COLUMN_NAME + '] IS NULL;'
FROM sys.DBA_TAB_COLUMNS
-- 4) Execute the dynamic SQL string
EXEC sp_ExecuteSQL @DynamicSQL
-- 5) Select the final results, grouping up on TableName, ColumnName, and Value to get the counts of 0 and NULL values per column per table
SELECT R.TableName, R.ColumnName, TC.DATA_TYPE AS DataType, R.[Value], COUNT(1) AS [Count]
FROM #Results AS R
INNER JOIN sys.DBA_TAB_COLUMNS AS TC
ON R.TableName = TC.TableName
AND R.ColumnName = R.ColumnName
GROUP BY R.TableName, R.ColumnName, R.[Value]
نصائح أخرى
Here's what I came up with:
--DROP TABLE incorrect_value_results;
--CREATE TABLE incorrect_value_results (id NUMBER, table_name VARCHAR2(30), column_name VARCHAR2(30), val_count NUMBER, value varchar2(30));
TRUNCATE TABLE incorrect_value_results;
DECLARE
l_count NUMBER;
l_inv_char_str VARCHAR2(2000);
TYPE invalid_char_val_rec IS RECORD(
cnt NUMBER,
inv_char VARCHAR2(20)
);
TYPE invalid_char_val_tab IS TABLE OF invalid_char_val_rec
INDEX BY PLS_INTEGER;
l_inv_char_vals invalid_char_val_tab;
l_index NUMBER;
BEGIN
l_index := 0;
-- Loop through each table in the schema
FOR i IN (SELECT table_name
FROM user_tables)
LOOP
-- Loop through each relevant column for this table
-- Exclude column EVT_FROM
FOR j IN (SELECT column_name, data_type
FROM user_tab_cols
WHERE table_name = i.table_name
AND column_name NOT IN ('EVT_FROM','ANGLE','UND','OFFSET')
AND table_name NOT LIKE 'A_%'
AND table_name NOT LIKE 'SDE%'
AND table_name NOT LIKE '%ANNO%'
AND table_name NOT IN ('INCORRECT_VALUE_RESULTS','NUMBERS')
AND data_type IN ('VARCHAR2', 'CHAR', 'NCHAR', 'NVARCHAR2', 'NUMBER'))
LOOP
IF j.data_type IN ('VARCHAR2', 'CHAR', 'NCHAR', 'NVARCHAR2') THEN
EXECUTE IMMEDIATE
'SELECT COUNT(1), '||j.column_name||
' FROM '||i.table_name||
' WHERE UPPER('||j.column_name||') IN('' '', '' '', ''0'', ''-'', ''NULL'', ''<NULL>'' )
GROUP BY '||j.column_name
BULK COLLECT INTO l_inv_char_vals;
ELSIF j.data_type = 'NUMBER' THEN
EXECUTE IMMEDIATE
'SELECT COUNT(1), '||j.column_name||
' FROM '||i.table_name||
' WHERE UPPER('||j.column_name||') <= 0
GROUP BY '||j.column_name
BULK COLLECT INTO l_inv_char_vals;
END IF;
-- If there are results then log them
l_index := l_index + 1;
FORALL k IN 1..l_inv_char_vals.COUNT
INSERT INTO incorrect_value_results (id, table_name, column_name, val_count, value)
VALUES (l_index,
i.table_name,
j.column_name,
l_inv_char_vals(k).cnt,
l_inv_char_vals(k).inv_char);
END LOOP;
END LOOP;
END;
/
COMMIT;