Running Scripts in Pentaho Kettle

WAIT! Before you read this, please know that I’ve published an update to this article here.  I think the update is much more useful, so maybe read the update and then come back?  Or don’t come back at all?  Whatever you prefer!

Recently, I found myself needing to run a Python script from inside Pentaho Kettle.  I wanted to convert a CSV file into JSON format and found that Kettle was running for an extremely long time just to complete this simple process.  Since I already had a solid Python script in place to complete this task, I decided I could use that instead of relying on the traditional Kettle steps.

To run an external script in Kettle, you need to use the Shell step.  This step is currently (as of Pentaho Kettle 5.2.0.0) only available in Jobs – not Transformations.  The step icon looks like this:

Shell Command

When you double-click on the step, you will encounter a menu like this:

post04 image02

The main items you should be concerned with are the fields “Script file name” and “Working directory”.

  • Script file name: This is the name of the script you would like to run. Alternatively, if you would like to input custom code, you can use the Script tab to do so. In my case, I had a saved .py script that I could use. I specified that script in this field. Make sure to include the file location in the name.
  • Working directory: This is where the Python shell can be found. In my case, this is a folder on the C drive. You do not need to specify the exact executable – just the folder in which it’s present.

And that’s it! This is how my final configured step looked:

post04 image03

Since all my scripting was in the included Python file, I did not rely on the Script tab for anything. In the case that a file is specified, Kettle automatically fills the Script tab with the command “python [file name]”.

As can be seen from the screenshots, Kettle also provides options for logging. If your script outputs information about its progress, timing, or anything else, you can store that information in a particular file.

Kettle also offers specific options for iterating scripts over every row.

I did not experiment with either the logging option or the iteration option, but would be interested in hearing from anybody that might have. Happy scripting!

4 thoughts on “Running Scripts in Pentaho Kettle

  1. Pingback: Running Scripts in Pentaho Kettle, the Sequel | Moran Nachum

  2. I have tried using python scripts, also correctly mentioned my python working directory, but it gives me errors like command not found for all commands.

    Any idea why is it failing ?

    • Errors could come up if the commands you are attempting to run through the shell step are not present in your environment. To troubleshoot, I would attempt to run those same commands in your terminal from the same working directory and seeing if the commands fail there as well.

      If they fail in terminal, it’s indicative of an issue with the command not being available on your machine. At that point, I would re-install the command or check environmental variables to make sure that the machine knows how to properly reference the command.

      If the command succeeds in terminal, then the issue is with Kettle and it becomes a bit trickier to resolve. I would encourage you to look at the detailed messages of the log to see if it’s a path error (the command not being able to connect to the proper path) or something else.

      I recall instances where certain commands would not work for me via the shell step. For the most part, those were environmental issues where I had specific commands installed on my machine, but those same commands were missing from the machine where I was attempting to run my Kettle jobs.

      If you are not able to isolate the issue, feel free to send over the job and I can review on my end.

  3. Had an issue setting a script (vbscript). Would only run if the shared directory written as mapped. I tend to leave the filepath open, so any computer can read the filepath (example: “\\myserver.mycompany.com\users\scripts” vs “H:users\scripts”)

Leave a comment

Create a free website or blog at WordPress.com.