Data shuffling and optimization

What kinds of spark optimizations are there? Are these adjustable spark configs in the pipeline settings the only way I can optimize?

Hi Juliette, Prophecy does not do any auto optimization.

But we give full ability for user to define it:

  • Spark configs can be defined in pipeline settings.
  • Repartition gem can be used to repartition the data
  • Join hints could be used in advanced tab of join gem
  • Cache can be used whenever branching out is present (to avoid recomputing of part of DAG)
  • For checkpointing / persist to disk, those can be done in script gem

Otherwise the catalyst engine in spark will apply optimizations as a normal spark application would.

Be sure to utilize Selective type interims if you want spark optimizations to be applied to your Interactive Runs while developing.