Question

I am using cassandra 2.0.3 and I would like to use pyspark (Apache Spark Python API) to create an RDD object from cassandra data.

PLEASE NOTE: I do not want to do import CQL and then CQL query from pyspark API rather I would like to create an RDD on which I woud like to do some transformations.

I know this can be done in Scala but I am not able to find out how this could be done from pyspark.

Really appreciate if anyone could guide me on this.

No correct solution

OTHER TIPS

Might not be relevant to you anymore, but I was looking for the same thing and couldn't find anything which I was happy with. So I did some work on this: https://github.com/TargetHolding/pyspark-cassandra. Needs a lot of testing before use in production, but I think the integration works quite nicely.

I am not sure if you have looked at this example yet https://github.com/apache/spark/blob/master/examples/src/main/python/cassandra_inputformat.py I have read from Cassandra using a similar pattersn

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top