How to Pythonize JVM Library.
Python is getting more and more popular within Spark ecosystem, but some functionalities, especially complex logic, cannot be efficiently described by using pyspark SQL functions or by using user defined functions. You can write a Scala or Java code instead but what if you also want to let the users use your solution using Python? And here comes py4j, Python package which connects Python and JVM worlds. It brings the possibility to create Python wrappers which call Java code underneath.
During this meetup I will try to explain:
- How to use py4j to translate JVM library to Python
- How to write a Spark Scala/Java SQL function and use it directly within Python
- How to translate and use any Scala/Java spark code within Python
- How to test the solution
- How we translated Apache-Sedona (incubating) to Python.