Question

I've written a simple java class called Person.java to create Person Object.

e.g:

public Person(){
}

public String getName() {
    return name;
}

public void setName(String name) {
    this.name = name;
}

I then implemented below code in Apache Spark main driver class.

JavaRDD<Person> people = ctx.textFile(logFile).map(
            new Function<String, Person>() {
                public Person call(String line) throws Exception {
                    String[] parts = line.split("\\|");

                    Person trans = new Person();
                    trans.setName(parts[0]);

                    return trans;
                }
            });

Above functions compiles and run fine But the problem is I'm not sure how to query the people dataset. How do i get the stored data?

when I tried: people.first();

the output was: Person@3f03a49

which i assume the problem with casting? How do i convert it to human readable?

Was it helpful?

Solution

I'd recommend learning some Java in general before trying to work with a complex library like Spark. Person@3f03a49 is not a problem with casting, this is how an object of class Person is converted to a String by default. You just need to define

@Override
public String toString() {
    return "Person(" + name + ")";
}

inside Person class.

How do i get the stored data?

With any actions: first, collect, etc. But note that by default Spark doesn't store data, it's computed on the fly.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top