What kinds of spark optimizations are there? Are these adjustable spark configs in the pipeline settings the only way I can optimize?
Hi Juliette, Prophecy does not do any auto optimization.
But we give full ability for user to define it:
- Spark configs can be defined in pipeline settings.
- Repartition gem can be used to repartition the data
- Join hints could be used in advanced tab of join gem
- Cache can be used whenever branching out is present (to avoid recomputing of part of DAG)
- For checkpointing / persist to disk, those can be done in script gem
Otherwise the catalyst engine in spark will apply optimizations as a normal spark application would.
Be sure to utilize Selective type interims if you want spark optimizations to be applied to your Interactive Runs while developing.
