Question

when i run my shark queries, the memory gets hoarded in the main memory This is my top command result.


Mem: 74237344k total, 70080492k used, 4156852k free, 399544k buffers Swap: 4194288k total, 480k used, 4193808k free, 65965904k cached


this doesn't change even if i kill/stop shark,spark, hadoop processes. Right now, the only way to clear the cache is to reboot the machine.

has anyone faced this issue before? is it some configuration problem or a known issue in spark/shark?

Was it helpful?

Solution

To remove all cached data:

sqlContext.clearCache()

Source: https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/sql/SQLContext.html

If you want to remove an specific Dataframe from cache:

df.unpersist()

OTHER TIPS

Are you using the cache() method to persist RDDs?

cache() just calls persist(), so to remove the cache for an RDD, call unpersist().

This is weird. The questions asked has nothing to do the answers. The cache OP posted is owned by operation system and has nothing to do with spark. It is an optimization of the OS and we shouldn't be worried about that particular cache.

And spark cache is usually in memory, but that will be in the RSS section, not the cache section of the OS.

I followed this one and it worked fine for me ::

for ((k,v) <- sc.getPersistentRDDs) {
   v.unpersist()
}

sc.getPersistentRDDs is a Map which stores the details of the cached data.

scala> sc.getPersistentRDDs

res48: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map()

The solution proposed:

sqlContext.clearCache()

gave me an error and I had to use this one instead:

sqlContext.catalog.clearCache()
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top