Question

I set up storm on a production server and it worked well until I had an abrupt power failure now I get the supervisor error supervisor [ERROR] Error on initialization of server mk-supervisor anytime I try to push a topology. Storm ui nolonger shows my number of workers (which used to be 4); is now 0. I get that supervisor is no longer working well but re-installing supervisor does not resolve the issue. I had this problem with a previous setup that caused me to redo the whole setup again to get it working. But I cannot keep doing a setup anytime supervisor fails.

2014-04-06 23:59:48 supervisor [INFO] Starting Supervisor with conf {"dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil, "topology.fall.back.on.java.serialization" true, "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000, "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000, "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500, "nimbus.monitor.freq.secs" 10, "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/var/storm", "supervisor.worker.start.timeout.secs" 120, "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs" 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64, "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "192.168.254.145", "storm.zookeeper.port" 2181, "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size" 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" "/storm", "supervisor.enable" true, "storm.zookeeper.servers" ["192.168.254.145"], "transactional.zookeeper.root" "/transactional", "topology.acker.executors" 1, "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil, "drpc.queue.size" 128, "worker.childopts" "-Xmx768m -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5, "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772, "supervisor.monitor.frequency.secs" 3, "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3, "topology.tasks" nil, "topology.spout.wait.strategy" "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending" nil, "storm.zookeeper.retry.interval" 1000, "topology.sleep.spout.wait.strategy.time.ms" 1, "nimbus.topology.validator" "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports" [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs" 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, "topology.workers" 1, "supervisor.childopts" "-Djava.net.preferIPv4Stack=true", "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs" 1, "topology.acker.tasks" nil, "topology.disruptor.wait.strategy" "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30, "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port" 3773, "zmq.threads" 1, "storm.zookeeper.retry.times" 5, "topology.state.synchronization.timeout.secs" 60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port" 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true", "storm.cluster.mode" "distributed", "topology.optimize" true, "topology.max.task.parallelism" nil}

  2014-04-06 23:59:48 supervisor [ERROR] Error on initialization of server mk-supervisor
java.lang.RuntimeException: java.io.EOFException
    at backtype.storm.utils.Utils.deserialize(Utils.java:68)
    at backtype.storm.utils.LocalState.snapshot(LocalState.java:24)
    at backtype.storm.utils.LocalState.get(LocalState.java:28)
    at backtype.storm.daemon.supervisor$standalone_supervisor$reify__4810.prepare(supervisor.clj:486)
    at backtype.storm.daemon.supervisor$fn__4757$exec_fn__1228__auto____4758.invoke(supervisor.clj:329)
    at clojure.lang.AFn.applyToHelper(AFn.java:167)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:601)
    at backtype.storm.daemon.supervisor$fn__4757$mk_supervisor__4782.doInvoke(supervisor.clj:327)
    at clojure.lang.RestFn.invoke(RestFn.java:436)
    at backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:477)
    at backtype.storm.daemon.supervisor$_main.invoke(supervisor.clj:506)
    at clojure.lang.AFn.applyToHelper(AFn.java:159)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at backtype.storm.daemon.supervisor.main(Unknown Source)
Caused by: java.io.EOFException
    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2323)
    at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2792)
    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:800)
    at java.io.ObjectInputStream.<init>(ObjectInputStream.java:298)
    at backtype.storm.utils.Utils.deserialize(Utils.java:63)

I cannot tell what actually went on during the power failure. I am lost here...Help pls

Was it helpful?

Solution

I think what you need to do is to clear all of the local directories that are used by Storm (at least, moving them to a different place to have some kind of backup -if needed-).

Local directories is what you configure in conf/storm.yaml.

This is due Storm wasn't shutdown properly.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top