Experiencing CLR error on SQL Azure DB
-
07-01-2021 - |
Domanda
Out of nowhere we started seeing this error and it seems to occur fairly frequently when making calls to the database or any other database in our Elastic Pool. The DTUs aren't being maxed out and the resource dmvs don't seem bad.
Failed to enter Common Language Runtime (CLR) with HRESULT 0x80131022. This may be due to low resource conditions. (db)
Here's what I got from the system resource governor pool
Resource Pool Name cache_memory (MB) used_memory (MB)
internal 104.773437 1577.125000
default 37.609375 38.796875
SloSecSharedPool 2.914062 8.156250
InMemBackupRestorePool 26.210937 101.437500
InMemDmvCollectorPool 186.195312 203.406250
InMemMetricsDownloaderPool 2.234375 2.250000
InMemDTAPool 0.000000 0.000000
SloHkPool 0.000000 0.031250
InMemQueryStorePool 22.453125 35.304687
InMemWIAutoTuningPool 3.312500 4.062500
InMemXdbLoginPool 3.976562 6.250000
PVSCleanerPool 0.000000 0.000000
InMemTdeScanPool 0.000000 0.000000
SloSharedPool1 1108.890625 1234.312500
From the sys.dm_os_performance_counters this is what it looks like for the past 3 hours.
cpu% data_io% log_write% memory_usage% max_worker% sessions%
22.37 73.51 16.54 41.56 5.50 0.43
This doesn't seem like a common error for SQL Azure as I cannot find anything that pertains to this happening past SQL Server 2008. Any help would be appreciated.
Update:
We got a response from the Microsoft support rep and it looks like our memory usage was completely capped out.
After moving the elastic pool up a tier the error has gone away. We have ~56GB of memory instead of the ~28GB we had previously and the error ceased. This would have probably also moved us to a different server on Azure which may have fixed the problem for now. The site has been running at around 78% on memory usage with the buffer cache hit ratio @ 8588 and PLE of 87332. Now it's operating at usage under 2% across the board for CPU, Workers, DataIO, etc. during normal load which seems like a huge waste.
We could not definitively identify what was causing the errors but we assume it was the complete memory usage as the site is running fine for now...
Soluzione
While Azure SQL Database (not the new Managed Instance) does not support custom SQLCLR Assemblies (i.e. the "CLR enabled" server-level configuration option), CLR is still used internally for the following features:
- CLR datatypes:
- Geometry
- Geography
- Hierarchyid
- Built-in functions:
- introduced in SQL Server 2012:
FORMAT
PARSE
TRY_PARSE
- introduced in SQL Server 2016:
AT TIME ZONE
COMPRESS
DECOMPRESS
sys.time_zone_info
- maybe others
- introduced in SQL Server 2012:
- SSIS (Fuzzy Lookup /
sp_FuzzyLookupTableMaintenanceInvoke
, etc) - Change Data Capture (CDC)
- Replication
- Master Data Services
- Policy Based Managemen (PBM; originally named "Dynamic Management Framework (DMF)" )
- External Tables (including External Data Sources and possibly External File Formats): This functionality is new as of SQL Server 2016, and this functionality using CLR internally has been mentioned in other answers here by Joe and Henrik. Both Joe and Henrik indicate being told by Microsoft that External Tables rely upon CLR, and while I have not been able to confirm this directly (by seeing the system App Domain get created when using any of this functionality), I was at least able to confirm that when "Lightweight Pooling" mode is enabled, all 3 of these components fail with the following error:
Msg 7432, Level 16, State 0, Line XXXXX
Heterogeneous queries and use of OLEDB providers are not supported in fiber mode. - maybe others
Of course, this does not point to what would be hogging memory. But it should help identify the affected area(s).
Altri suggerimenti
Using Azure SQL DB (not managed instance) and it most definitely uses the CLR for things like geo-spatial and AT TIME ZONE
I have been having this same problem and could not find a solution. Working with MS support and this is what we came up with as a possible resolution - just did it 10 minutes ago, but seems legit:
The hypothesis is that the CLR errors we have been experiencing are due to the use of external tables which leverage the CLR (via elastic search). The external table implementation has been corrupting CLR threads and is under investigation by Microsoft.
I was only using external tables for some operational maintenance, so I was able to drop those. I then had to reset the databases to reset all the threads. There is an undocumented DBCC command to do this:
dbcc stackdump('')
This takes down your database and then it fails over to another redundant server. When I ran that, there was about a 5 second outage
As you can read on this documentation, CLR is not supported on Azure SQL Database. However, the new Azure Managed Instance does support CLR.
When I had this error, I ran this query in the master database:
select DATEADD(hour, DATEDIFF(hour, 0, rs.start_time), 0) as hr, max(avg_instance_memory_percent) as MaxMem
from sys.elastic_pool_resource_stats rs
group by dateadd(hour, datediff(hour, 0, rs.start_time), 0)
order by 1
It showed that the max memory had gone to 100% in the hours prior to the error.
So now I run it on a schedule, and I get a warning if it goes over 95%.
Then I can schedule a restart of the Azure SQL database outside office hours, or get Microsoft to look at it, before everything stops working.
Unfortunately it has turned out that the error is not totally related to memory usage. I opened a case with Microsoft, and was told that there are currently a problem with External Tables (which is currently just in preview/beta)
I think I'll just start running the PowerShell command Invoke-AzSqlDatabaseFailover on a schedule.