AI Prompting Best Practices for Creating Data Pipelines & Analyses in Prophecy

1. Introduction

Prophecy provides AI-powered tools to help you build data pipelines using natural language prompts. Whether you’re starting from scratch, migrating existing SQL or SAS code, or iterating on an existing pipeline, effective prompting is the key to getting the best results from Prophecy’s AI capabilities.

This guide covers best practices for working with two primary AI tools in Prophecy:

  • AI Agent (Chat) — Available in SQL projects for conversational pipeline development
  • Data Copilot — Available in Spark/PySpark projects for suggestions and code generation

2. General Prompting Principles

2.1 Be Specific and Explicit

The more precise your prompt, the better the result. Avoid vague instructions and instead describe exactly what you need.

:cross_mark: Instead of: Clean the data

:white_check_mark: Try: Remove duplicate customer records based on customer_id, and fill null values in the email column with 'unknown@example.com'

:cross_mark: Instead of: Join the tables

:white_check_mark: Try: Inner join the orders table with the customers table on customer_id, keeping all columns from orders and only customer_name and email from customers

2.2 Use Step-by-Step Instructions

Break complex transformations into smaller, sequential requests rather than trying to do everything in one prompt. This approach:

  • Improves the reliability of the generated pipeline
  • Makes it easier to debug or adjust individual steps
  • Let’s you inspect intermediate results at each stage

Example workflow:

  1. Read the notes table and the note_templates table
  2. Join notes with note_templates on NoteTemplateId
  3. Filter to keep only rows where NoteTemplateName equals 'Request for Information'
  4. Group by ServiceAuthId and count the number of notes per authorization

2.3 Reference Tables and Gems with @ Mentions

Use the @ symbol to reference specific tables in your warehouse or gems already on your pipeline canvas. This removes ambiguity and helps the AI locate the exact objects you mean.

Examples:

  • How many records are in the @transactions table?
  • Aggregate the @orders_cleaned gem to show sum of order quantity per month
  • Add a filter after @last_aggregation_gem to show only records from 2024

As you type @, Prophecy will suggest matching table names and gem labels.

2.4 Use Descriptive Column Names

Prophecy’s AI suggestions work best with meaningful column names. Descriptive names give the AI semantic cues to generate accurate expressions.

:cross_mark: Not recommended: col1, field_a, x

:white_check_mark: Recommended: customer_email, order_date, total_amount

If your source data has ambiguous column names, consider renaming them early in your pipeline to improve AI suggestions downstream.

3. Building Pipelines with the AI Agent (SQL Projects)

3.1 Start with Data Exploration

Before building transformations, explore your available data:

  • Find datasets with information about service authorizations
  • Show me sample data from @notesnotes where NoteTypeKey = ‘DECR’
  • What columns are available in the @serviceauth table?

This helps you understand your data before writing transformation logic.

3.2 Build Incrementally

Rather than prompting for an entire complex pipeline at once, build it step by step:

Step 1 — Add sources: Add the notes and note_templates tables to the pipeline

Step 2 — Join: Join @notes with @note_templates on NoteTemplateId

Step 3 — Filter: Filter to keep only rows where NoteTemplateName is ‘Request for Information’ and NoteTypeKey is ‘DECR’

Step 4 — Transform: Add a column called NoteReviewer_Name using COALESCE(DisplayName, CONCAT_WS(’ ', FirstName, LastName))

Step 5 — Aggregate: Group by ServiceAuthId and count the number of notes

Step 6 — Save: Save the final output as a table

After each step, use Inspect to review the generated gems and verify the output before continuing.

3.3 Inspect and Verify

After each transformation:

1. Click Inspect to review the configuration of modified gems (highlighted in yellow)

2. Use Previous/Next to navigate through all modified gems

3. Examine both input and output data to confirm the transformation produces expected results

4. If something looks wrong, click Restore or Rollback to revert to a previous state and try a different prompt

3.4 Use Specific Phrases to Generate Visuals & Dashboards

If you’d like to see charts of your data, you can prompt AI to create visuals with specific phrases. If you have a specific chart in mind, include that in your prompt

Examples:

  1. Data exploration: Create an analysis dashboard of electronic good sales from 2023 from the @ecommerce data

    • This will create a new analysis tab with charts that the AI interprets, which would be helpful, using the full dataset
  2. Specific visuals: Create a pie chart of total revenue by gender

    • This will create a new pie chart and add it to the existing analysis dashboard
  3. Filtering: Add a dynamic filter to the electronics_sales_2023_dashboard for sale_month

    • Adds a filter to the dashboard that applies to all visuals; show results by going into preview mode, changing the filter, and running the analysis again.

Once the analysis dashboards are created, click on the edit button to inspect the columns and aggregations used. You can also change chart types, colors, and manually add more charts.

4. Prompting for Code Migration Use Cases

When migrating existing SAS, SQL, or other code to Prophecy pipelines, follow these strategies:

4.1 Provide Business Requirements as Context

The most effective prompts for migration include a business requirements document that describes the pipeline logic in plain English. This helps the AI understand the intent behind the code, not just the syntax.

A good business requirements prompt document should include:

  • Source Tables: List all tables with their full paths and friendly names
  • Step-by-Step Logic: Describe each transformation step in plain English, including join conditions, filter criteria, and column derivations
  • Column Definitions: Specify how derived columns should be calculated, including any SQL expressions or business rules
  • Expected Output: Describe the final output table structure and what each row represents

4.2 Include Reference SQL

Alongside business requirements, providing the original SQL or SAS code as a reference helps the AI generate accurate transformations. You can include:

  • The original source code (SAS, SQL, etc.) for the AI to translate
  • A simplified Databricks SQL version that serves as a more direct mapping
  • Both a high-level summary and a detailed version of the requirements

4.3 Use a Structured Prompt Format

For complex migration prompts, organize your prompt document into clear sections:

Section 1 — Source Tables: List each table with its path, alias, and purpose

Section 2 — Transformation Steps: Number each step and describe the logic

Section 3 — Join Conditions: Specify join types, keys, and any special conditions

Section 4 — Filter Criteria: List all WHERE clause conditions in plain English

Section 5 — Column Derivations: Define calculated columns with formulas

Section 6 — Output Specification: Describe the target table schema and write destination

4.4 Iterate and Refine

Migration is rarely a one-shot process. Plan to:

1. Generate an initial pipeline from the prompt

2. Run the pipeline against sample data

3. Compare output with the expected results from the legacy system

4. Refine the prompt or manually adjust gems as needed

5. Repeat until output matches expectations

5. Sample Prompts for Common Operations

Data Exploration:

  • Find datasets containing customer information
  • Show me 5 random records from @sales_data
  • Visualize number of patients per city in @patients_raw_data

Filtering:

  • Filter to only include orders from 2024
  • Remove rows where email is null
  • Keep only rows where SATypeKey is 'IP', 'AS', or 'MO'

Joins:

  • Join the orders and customers tables on customer_id
  • Left join @notes with @note_templates on NoteTemplateId

Aggregations:

  • Group by region and calculate average sales
  • Count the number of records per ServiceAuthId

Transformations:

  • Calculate total revenue as quantity * price
  • Rename cust_id to customer_id and amt to amount
  • Convert the date from UTC to Eastern Time and format as MM/dd/yyyy

Output:

  • Save the final output as a table
  • Show me and save the final output of the pipeline

6. Troubleshooting Tips

Agent can’t find tables: Reindex your Databricks connection so the Knowledge Graph includes up-to-date table metadata.

Conversation feels stuck: Click the “Reset” option in the chat interface to start a fresh session with a clean context.

Unexpected results: Use Inspect to review each gem’s configuration. Check input and output data at each step to find where the logic diverges from expectations.

Complex expressions not generated correctly: Break the expression into simpler parts. Generate each part separately, then combine them manually in the gem configuration.

7. Summary of Key Tips

1. Be specific — Describe exactly what columns, tables, filters, and transformations you need

2. Build incrementally — One step at a time, inspecting results along the way

3. Use @ mentions — Reference tables and gems by name to reduce ambiguity

4. Use descriptive names — Meaningful column and gem names improve AI suggestions

5. Provide context — For migrations, include business requirements and reference SQL

6. Inspect everything — Review generated gems and data before proceeding

7. Iterate — Refine your prompts based on results; migration is a multi-pass process

8. Combine AI with manual editing — Use the AI for scaffolding, then fine-tune in the visual editor

For more information, visit the Prophecy documentation at docs.prophecy.ai

1 Like