Use SparkSubmit to initialize JVM (maybe?) #94

exyi · 2021-07-18T19:43:51Z

PySpark seems to start JVM using the spark-submit script: https://github.com/apache/spark/blob/master/python/pyspark/java_gateway.py#L63. That has some benefits, I'm specifically looking for an easy way to add dependencies using the spark.jars.packages config.

However, I don't know how they call java methods... I think Spark.jl could call the SparkSubmit.main method using jcall, which should lead to basically the same behavior, but the JVM will remain under Julia's control.

Honestly, I'm quite confused of how spark-submit works, maybe I'm just missing something obvious. I though it could be possible to execute Julia script using spark-submit after the dependencies are handled, but that also does not work :/

The text was updated successfully, but these errors were encountered:

dfdx · 2021-07-18T19:54:40Z

So are you looking for a way to add custom JARs? If so, we have add_jar function for SparkContext, and there should be a similar way to add jars to SparkSession (you can call any Java methods using JavaCall.jcall()).

exyi · 2021-07-18T20:24:08Z

add_jar does not really cut it, the package has many dependencies and I'd really like Spark/maven to load them for me. I could find a method similar to addJar that would add packages :/

dfdx · 2021-07-18T20:44:25Z

You can try something like:

config = Dict("spark.jars.packages" => "...")
spark = SparkSession(..., config=config)

This should be equivalent to set this config via spark-submit.

aviks · 2021-08-06T22:34:46Z

I looked at spark-submit a few years ago when I worked on this package, and it seemed too complicated -- I did not really understand how it worked. The way we load the JVM here seemed easier and more appropriate to me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use SparkSubmit to initialize JVM (maybe?) #94

Use SparkSubmit to initialize JVM (maybe?) #94

exyi commented Jul 18, 2021

dfdx commented Jul 18, 2021

exyi commented Jul 18, 2021

dfdx commented Jul 18, 2021

aviks commented Aug 6, 2021

Use SparkSubmit to initialize JVM (maybe?) #94

Use SparkSubmit to initialize JVM (maybe?) #94

Comments

exyi commented Jul 18, 2021

dfdx commented Jul 18, 2021

exyi commented Jul 18, 2021

dfdx commented Jul 18, 2021

aviks commented Aug 6, 2021