Resolving frequent LGWR switch events recorded in Event log with heavy dblink/materialized view activity - Oracle 11g XE

https://dba.stackexchange.com/questions/15267

oracle
logs

16-10-2019
|

Question

We use a student information system, PowerSchool, marketed by Pearson. PowerSchool uses Oracle as a backend database, and we have the need to create some of our own reporting views. Pearson says if you make any changes to the production machine you void the license, so to that end, we're running Oracle 11gR2 Express almost strictly for creating views with PowerSchool data store over dblink. We also integrate with a SQL Server machine using ODBC and the heterogeneous services support in Oracle.

To reduce load on the PowerSchool machine, we have most of our more costly queries set up as materialized views that refresh every hour (or as appropriate). From googling wait events before, I get the sense that we use dblinks and materialized views way more than your average bear.

In poking around the alert_xe file, I notice frequent events that indicate that the log is advancing to a new sequence:

 LOG
 Sun Mar 18 22:16:22 2012
 Thread 1 cannot allocate new log, sequence 26914
 Checkpoint not complete
 Current log# 1 seq# 26913 mem# 0:      C:\ORACLEXE\APP\ORACLE\FLASH_RECOVERY_AREA\XE\ONLINELOG\O1_MF_1_70W1H0SF_.
 LOG
 Thread 1 advanced to log sequence 26914 (LGWR switch)

longer output at http://pastebin.com/m3j5YT0B

I did some research and I saw some advice that indicated that I should change the frequency of my checkpoints or make my logs larger. Here on dba, the only reference I could find to checkpoint not complete was this, which doesn't seem to address what I'm seeing.

From http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1364093900346690427">this ask Tom question

"You either allocate more logs of the same size you have (so we have longer to complete that checkpoint) or you can make your log files larger (actually, you have to create new larger ones then drop old smaller ones). But the point would be to have sufficient redo logs configured to carry you through the peak loads without any "cannot allocate new log" messages."

On one of the threads, someone asked for the output of: select group#, bytes/1024/1024 from v$log;

mine is:

 GROUP#                 BYTES/1024/1024        
 ---------------------- ---------------------- 
 1                      50                     
 2                      50

Output of

 SELECT OPTIMAL_LOGFILE_SIZE FROM V$INSTANCE_RECOVERY;

is null.

I ran

 show parameter target

after reading this and output is:

 NAME                                               TYPE        VALUE                                                                                                
 -------------------------------------------------- ----------- ---------------------
 archive_lag_target                                 integer     0                                                                                                    
 db_flashback_retention_target                      integer     1440                                                                                                 
 fast_start_io_target                               integer     0                                                                                                    
 fast_start_mttr_target                             integer     0                                                                                                    
 memory_max_target                                  big integer 0                                                                                                    
 memory_target                                      big integer 0                                                                                                    
 parallel_servers_target                            integer     16                                                                                                   
 pga_aggregate_target                               big integer 90M                                                                                                  
 sga_target                                         big integer 272M

So what I have gathered from all this is that Oracle is running regular checkpoints to be used in recovery, but that those checkpoints are failing to complete because the logs are switching so frequently. All of this gets recorded into my alert_xe file.

From the Oracle guide, my online log redo files need to be sized to "the amount of redo your system generates." I suspect that we generate much more than your average bear, because we have frequent automated refreshes of materialized views.

So, my questions:

1) Is this the right diagnosis of the problem?

2) Is it problematic that my optimal_logfile_size parameter is null?

3) Given that we are potentially hitting materialized views (and thus generating a lot of redo) harder than most, what course of action is appropriate to resolve the checkpoint not complete/LGWR switching I'm seeing in my alert log? Is this just a matter of increasing the size of the redo logs?

Solution

Close, I think you should be concerned with the redo itself rather than the log switches it is causing.
I think this is because fast_start_mttr_target is set to zero, but again this should not be the issue to focus on.
Here is the heart of the matter:

Since you can't change the source database to have a materialized view log, every MV refresh has to delete all the records and re-insert them all, which creates your excessive redo. There are many different approaches you could take to this issue. Here are some.

Simply increase the redo log size. This will decrease the impact of frequent switches, but the redo will still be generated.
Roll your own Materialized View. This is more work, but would probably generate significantly less redo.
- Your refresh could Truncate the table and do an insert with the Append hint if you can tolerate the absence of data during the refresh.
- Drop the table and do a CTAS from the source with the append hint is another option that sacrifices availability for speed and reduced redo.
- If you need the data to be available you could selectively update and insert data based on what actually changed. This would introduce a bit more load on the source database, but would probably decrease your redo significantly.
- Depending on your data needs you might get away with only handling inserts above the last known sequence number, and/or updates when only certain significant columns have changes. The more custom the solution the less redo, but also the more work to maintain.
A more drastic approach would be to recover the tablespace involved to your second database periodically, but since you are using XE, that probably isn't an option.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange