Build Spark and debug it remotely at IntelliJ

sparkjavaintellijetl

Build at the command prompt

$ git clone --branch v3.3.0 --depth 1 https://github.com/apache/spark.git 

Install Java 8 with asdf.

$ brew install asdf
$ echo -e "\n. $(brew --prefix asdf)/libexec/asdf.sh" >> ${ZDOTDIR:-~}/.zshrc
$ asdf --version
v0.10.2

$ asdf plugin-add java
$ asdf list-all java
$ asdf install java corretto-8.342.07.3
$ asdf global java corretto-8.342.07.3
$ echo ". ~/.asdf/plugins/java/set-java-home.zsh" >> ~/.zprofile
$ java -version
openjdk version "1.8.0_342"
OpenJDK Runtime Environment Corretto-8.342.07.3 (build 1.8.0_342-b07)
OpenJDK 64-Bit Server VM Corretto-8.342.07.3 (build 25.342-b07, mixed mode)

Check the build’s success.

$ export MAVEN_OPTS="-Xss64m -Xmx2g -XX:ReservedCodeCacheSize=1g"
$ ./build/mvn -DskipTests clean package

Build at IntelliJ

Open codes as Maven Project from “New > Project from Existing Sources.” There is JDK in ~/.asdf/installs/java/, so make hidden directory visible with “Command + Shift + .”, and choose it. After that, run “Generate Sources and Update Folders For All Projects” from Maven window, and then Build Project becomes successful.

Remote debug

Starting to debug with Listen to remote JVM, and passing following options as spark.driver.extraJavaOptions, breakpoints work.

Debug a Java application running on a remote machine by enabling JDWP - sambaiz-net

$ ./bin/spark-shell --conf "spark.driver.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=n,suspend=n,address=localhost:5005"
scala> spark.sql("select 1+1").collect()

In sbt, options can be passed as follows.

$ ./build/sbt
sbt:spark-parent> project core
sbt:spark-core> set javaOptions in Test += "-agentlib:jdwp=transport=dt_socket,server=n,suspend=n,address=localhost:5005"
sbt:spark-core> testOnly *SparkContextSuite -- -t "Only one SparkContext may be active at a time"

References

Install Java with asdf | Peaceful Revolution

asdfを使ってScalaの開発環境を構築する手順 (Mac / IntelliJ)