Data shuffling and optimization

JulietteL · November 13, 2025, 4:36pm

What kinds of spark optimizations are there? Are these adjustable spark configs in the pipeline settings the only way I can optimize?

ProphecyCommunity · November 13, 2025, 4:43pm

Hi Juliette, Prophecy does not do any auto optimization.

But we give full ability for user to define it:

Spark configs can be defined in pipeline settings.
Repartition gem can be used to repartition the data
Join hints could be used in advanced tab of join gem
Cache can be used whenever branching out is present (to avoid recomputing of part of DAG)
For checkpointing / persist to disk, those can be done in script gem

RJ-prophecy · November 17, 2025, 8:30pm

Otherwise the catalyst engine in spark will apply optimizations as a normal spark application would.

Be sure to utilize Selective type interims if you want spark optimizations to be applied to your Interactive Runs while developing.

Topic		Replies	Views
How does the script gem work with SQL projects? Platform Discussions professional-edition , sql	1	22	January 28, 2026
Question on scheduling many pipeline runs at the same time Platform Discussions automate	1	27	December 30, 2025
AI Prompting Best Practices for Creating Data Pipelines & Analyses in Prophecy Knowledge how-to , ai	0	22	February 26, 2026
How to rename an existing pipeline? Platform Discussions	1	25	January 8, 2026
Announcing Prophecy 4.2.4 Announcements release	0	42	January 20, 2026