How to Troubleshoot Crontab Issues

The other day, I updated the operating system on the server on which my crontab was installed. I thought that this would be a harmless upgrade to the system — nothing too crazy. However, I was wrong, the crontab that was working perfectly well on the old operating system now seemingly ceased to work. Nothing had changed in my setup of the scripts or the crontab, so why was it not working?

I set about troubleshooting the issue with some guidance from this article:

Disclaimer: After all my troubleshooting, I ultimately did not resolve the core of my issue and instead decided to give up on crontab and use a different solution. So, if you’re looking for a solution to my crontab problems at the end of this article, you will not find it.

  1. Verify my crontab settings: First thing I did was open up my crontab to make sure that everything was configured properly (and as I’d left it). I did this using the command…
    env EDITOR=nano crontab -e

    And sure enough the setup of my crontab was as I had left it:

    0 * * * * cd ~/Desktop/how\ to\ troubleshoot\ crontab/ && python2.7 hello_world.py

    While I was here, I thought to update the “0” in my job to “*” so that the job will run regularly every minute. This will make it easier to troubleshoot what was going on. I then closed and saved the crontab.

  2. Make sure the command can run successfully from Terminal: My second thought, after seeing that everything was all right with the file, was attempting to run the full command from my crontab file in Terminal and seeing if it worked:



    It did!
  3. Attempt to install crontab with admin permissions: Once I verified that the command could run outside of crontab without an issue, I started considering that maybe I didn’t have proper permissions to install the crontab on the server. So, I re-opened the crontab file with sudo privileges:sudo env EDITOR=nano crontab -e

    This is what I saw:

    What happened to my crontab command? Well, just to be sure, I went ahead and entered the same command into this crontab file that I had opened with the sudo permissions and checked if that produced the result I wanted.  It did not.

  4. Make sure the application has proper permissions: When I closed my last crontab edit (with sudo permissions), this little dialog box showed up:

    So, I speculated that perhaps iTerm, the Terminal application I was using to edit and install the crontab did not have proper permissions to install the crontab file on my server. I opened up my Mac’s “System Preferences”, accessible from the Apple icon in the upper left corner of the screen:

    Navigated to “Security & Privacy” tab:

    Scrolled to the “Full Disk Access” portion and unlocked the screen:

    Then added the iTerm application to the list of applications with “Full Disk Access” using the “+” button under the list of applications to the right:

    With iTerm now having “Full Disk Access”, I re-attempted to save the crontab (using sudo permissions), but had no success.

  5. Brute-force a crontab into its expected placement: So, I turned to the internet and found the article listed at the top that seemed pretty helpful. It stated, “There is a system-wide /etc/crontab file…”

    I neglected to read the rest of the sentence and instead went searching for the existence of the system-wide crontab file in the specified /etc/ directory.

    This is a directory that is otherwise hidden, so I used the “Spotlight” search feature on Mac (accessed via Cmd + Space) to search for the /etc/ directory and found it:

    And sure enough, once I opened that directory, I noticed that it had no file called “crontab” in it.

    In the absence of a “crontab” file, I decided I’d add one myself in a rather unorthodox manner. I opened a new file in Sublime Text, wrote my crontab command into the file, saved it to my “Desktop”, and then dragged it from that directory into the “etc” directory. What could go wrong?

    And yet, despite all this, nothing seemed to change! My crontab job was still failing to run.

     

  6. Check crontab run logs: So, I scanned the web article for some other ideas on what may be going on and spotted a recommendation to run the command…
    ps -ef | grep cron | grep -v grep

    …to check if my crontab has run. And sure enough it had! Multiple times even!

    To better understand this output, I looked up how to get the column headers for the grep command and found this helpful writeup. I went ahead and amended my command to the following:

    ps -ef | head -1 && ps -ef | grep cron | grep -v grep

    Now that it seemed that something was running, I did some accounting: Up to this point, it’s possible that I may have set up three different crontab files. One was set up via crontab using my own user permissions, another was set up via crontab using sudo permissions, and the last one was brute-forced into the “/etc/” directory using a text file.

    Of those three, it seemed that one or two were running. I suspected that the command labeled “(cron)” was a result of my sudo permissions manipulation and that the command labeled “/usr/sbin/cron” was coming from the one setup with my standard user permissions. As to why two instances existed for both, I was not sure. So, in an effort to verify this, I deleted my brute-forced crontab file from the “/etc/” directory and waited a minute to see if the logs looked any different.

    The logs did not look any different. Thus, my next step was to delete the crontab I added using sudo permissions. To do this, I opened the crontab file again using sudo permissions, deleted its contents, and saved. Again I waited a minute and re-ran the logs to see if anything had changed:

    This time around, I noted that duplicate commands have been removed. This let me know a few things: a) the crontab I had originally configured using my standard user permissions was running and b) it was generating two commands. I suspected two commands were showing because I was concatenating two commands in my crontab: one command to change directories and another to run a Python script.
     

  7. Split the command and leverage logs: I was more befuddled than ever at this point. It seems that all my poking around had been for nothing. My crontab was installed and running successfully. So maybe it was something in my crontab that wasn’t working well.I decided to employ logs in order to troubleshoot further. I broke up my command into different components and recorded output messages into separate logs in order to see where an error may have occurred:
    * * * * * echo 'success' >> /tmp/log1.log
    * * * * * cd ~/Desktop/how\ to\ troubleshoot\ crontab/ && echo 'success' >> /tmp/log2.log
    * * * * * cd ~/Desktop/how\ to\ troubleshoot\ crontab/ && python2.7 hello_world.py >> /tmp/log3.log
    * * * * * cd ~/Desktop/how\ to\ troubleshoot\ crontab/ && python2.7 hello_world.py && echo 'success' >> /tmp/log4.log

    First thing I noted when running the new crontab is that each command line in the file generated two commands in the crontab run logs:

    So my previous assumption that the number of commands in the log corresponded to the number of commands in a line was not correct.

    Second, I noted that while I expected 4 logs to generate, only 3 did:

    And, unfortunately, they were all empty. This was not what I expected at all since when I ran the third command line from my crontab file in Terminal…

    cd ~/Desktop/how\ to\ troubleshoot\ crontab/ && python2.7 hello_world.py >> /tmp/log3.log

    …it generated a log file with a value in it:


    I was starting to suspect that crontab was having trouble running my Python script.

  8. Try running an executable file: Better informed about what was happening within my crontab, but somewhat exasperated by my situation, I thought to convert my Python script file to an executable file. I thought that an executable file may be easier for crontab to process since it would remove the dependency on the “python2.7” command. Unfortunately, even this failed!
  9. Use a different solution: So, feeling completely out of options, I decided to give up on crontab and try something different altogether: Automator.

I apologize for the disappointing conclusion. Nonetheless, I think the troubleshooting process for my issue is helpful in understanding how crontab works and how to check that crontab is running. The main takeaways:

  • When troubleshooting with crontab, update the command to run every minute to provide a quicker feedback loop.
  • Verify the format of your crontab.
  • Attempt the commands from the crontab file in Terminal to make sure they work on their own.
  • Leverage crontab logs to check that crontab is installed and running correctly.
  • Add output logs to your commands in the crontab file to better understand where failure is occurring.

Running Scripts in Pentaho Kettle, the Sequel

Surprise!  One of this blog’s most successful posts is about how to run scripts in Pentaho Kettle.

Confession: I wrote that post a long time ago (in fact, it was one of my very first posts about Pentaho Kettle).  And since then, I’ve learned a lot more about Kettle and about running scripts in Kettle.  Therefore, I wanted to offer a refresher to the original post and a new recommendation on how to better run scripts in Kettle.

More recently, I’ve been running scripts in Kettle like this:

scripts_general

What’s different?

  1. The “Insert Script” option is checked meaning that the second tab, “Script”, is now available for us to fill in.  This tab acts like our Terminal in Kettle.  Anything that you can run in Terminal, you can execute in the Script tab, but I’ll get more into that later.
  2. The “Working Directory” is now an environmental variable.  This is an improvement over our previous configuration, since it allows for greater transferability of the Kettle job from one person to another.

On the “Script” tab, this is my configuration:

scripts_script

In here, I’m using environmental variables to specify my output directories providing more ease of transferability when exchanging jobs with other persons.  Additionally, I am not relying on the machine’s version of Python, but rather a version of Python specific to a virtualenv.  This again, better insures that when transferring my job to other people, they are able to recreate the virtual environment I’ve created on my machine and run the job without a problem.

In Practice

Let’s say I wrote a script that:

  1. Pings an API
  2. Places the returned data in a JSON file

The script takes in two inputs: a link to ping the API and an output filename where the returned JSON will be placed.  This is my script:

import requests
import argparse
import json
import datetime

# Writes to a JSON file.
# Input: filename/location string, data in the form of array of
# dictionaries
###################################################################
def toJSON(filename, data):
    with open(filename, 'w') as outfile:
        json.dump(data, outfile)

# Call a given link.
# Input: API link to ping
###################################################################
def callAPI(link):
    response = requests.get(link)
    return response

# Parses incoming information
######################################################################
def commandLineSetup():
    commandParser = argparse.ArgumentParser(description="Pings an API link for data "
"and outputs data to JSON file")
    commandParser.add_argument("-l", "--link", help="API link to ping for information")
    commandParser.add_argument("-o", "--outputfile", help="Name/Path of the JSON output file")

    args = commandParser.parse_args()

    return args.link, args.outputfile

######################################################################
# MAIN
######################################################################

def main():
    LINK, OUTPUT = commandLineSetup()

# Check that proper inputs were provided
    if not LINK or not OUTPUT:
        print str(datetime.datetime.now()) + " - Insufficient inputs provided"
        exit()

    print str(datetime.datetime.now()) + " - Calling link %s" % LINK
    response = callAPI(LINK)
    print str(datetime.datetime.now()) + " - Outputting to file %s" % OUTPUT
    toJSON(OUTPUT, response.json())
    print str(datetime.datetime.now()) + " - Done!"

if __name__ == "__main__":
    main()

Notice that my script relies on two packages that are not native to Python: requests and argparse.  I use requests to ping the API to retrieve data and argparse to parse passed-in information from the command line.  To accommodate for these two modules, I create a virtual environment called “example”, which has a requirements.txt file.

Once my virtualenv is configured, I can test out my Python script in my virtualenv in my terminal window by running a command from within the working directory:

scripts_terminal

My working directory in this case is ~/Desktop/Personal/Sites/Blog/Projects.  This is also where I have my job saved:

scripts_job

Therefore, when configuring my script for execution within Kettle, I can use the variable ${Internal.Job.Filename.Directory} to specify my working directory and enter the same command as I did in Terminal and everything will execute just as it did in Terminal:

scripts_job_generalscripts_job_script

To check out my example transformation, please download this file (make sure to create the virtual environment before attempting to run the job; name the virtual environment “example”).

Recap

When executing scripts in Kettle, it is better to use the “Insert script” option, since it allows for:

  • Better job transferability
  • Easier compatibility of virtual environments
  • Integration of Kettle environmental variables

I hope you find this useful!

Workaround for python setup.py egg_info Error

Recently, I was working in Python and trying to install the pandas module using the pip command, but kept getting an error like this:

InstallationError: Command python setup.py egg_info failed with error 
code 1 in /var/www/python/virtualenv

So frustrating!

I spent about two hours trying to figure out how to resolve this issue and wanted to share my solution with you here.

To summarize, I downloaded the necessary package using apt-get and then moved the module into my virtual environment via bash command. So, even if you did not experience the same error message as me, this post might be helpful if you’re just looking to move modules from the general Python directory into your virtual environment.

For those looking to get a better understanding of the difference between apt-get and pip, I recommend Aditya’s answer on this StackOverflow post (I don’t have much experience in this).

For those looking to resolve the error:

  1. To start, you need to install the desired module using the apt-get command. For pandas, that’s this:
    sudo apt-get install python-pandas

    This installs the module in a system-wide location on your server.

  2. Find where your recently-downloaded module is located on the server. It should be in your dist-packages directory under your installation of Python.

    The complete path for me was located at /usr/lib/python2.7/dist-packages.

    You can also launch Python using the command python –v and see where all the different loaded packages reside upon start-up of the application.

  3. Create a directory for the module in your virtual environment:
    cd /var/www/python/virtualenv/lib/python2.7/site-packages && mkdir pandas
  4. Now move all files from the system-wide module directory into your virtual environment:
     mv /usr/lib/python2.7/dist-packages/pandas/* /var/www/python/virtualenv/lib/python2.7/site-packages/pandas/

And that’s it! You should now be able to access the module in your virtual environment. You can verify this by launching your virtual environment version of Python and attempting to import the module. No errors means success.

Do note that given discrepancies between apt-get and pip, the version of the module you transferred may not be the most recent version available.

Also, this might not be the best solution for your problem! I was stuck on the error for a long time and opted for a workaround like this to alleviate my issues after trying a number of different recommendations across the web. If you find a better solution, please forward it my way.

Running Scripts from Sublime Directly in Your Virtual Environment

What’s better than having a virtual environment setup? Being able to run scripts directly from within Sublime on that virtual environment!

Here’s how you set this up:

  1. Open up Sublime.
  2. Navigate to Tools → Build System → New Build System.

    Build Systems Menu in Sublime
  3. A new tab will open up in your Sublime application. It’ll look like this:

    New Build Tab
  4. Erase the contents of the tab and copy/paste the code below in its place:

    {
    	"cmd": ["/absolute-file-path-to-your-env/bin/python", "$file"],
    	 "selector": "source.python"
    }
    

    Make sure to change the path (the first item in the “cmd” array) to the absolute path to your virtual environment version of Python.

    New Build Code in Tab

  5. Save the file. Make sure to name it as you would like to see it in the build directory.

    Saving the New Build
  6. Now select the virtual environment build from the Build System menu and get to programming.

    Selecting the New Build from the Build System Menu

Thank you to Shang Liang for originally posting on this.

Setting Up Virtual Environments for Python

I’ve been greatly encouraged by colleagues and friends to try working on a MacBook instead of a Windows machine. Given that there is no cost to me, since the computer is provided by my company, I thought I’d give it a try. Thus far, I’ve been enjoying it greatly. And one of my favorite things that I’ve learned thus far is the ease with which one can build and manage virtual environments.

A virtual environment is different from a virtual machine. A virtual machine requires allocation of disk space and RAM. A virtual environment is simply an isolated working copy of Python. The two main benefits to setting up virtual environments are

  • that you can work in two completely different environments simultaneously and
  • you create accountability and tracking for changes made to the environment that can later be replicated by a team member if they decide to pick up on your project

Now, if you’re like me when I first heard about this, you’re probably feeling a bit intimidated. No worries! Let’s do a quick step-by-step on how to set all of this up so that you can start configuring and using your own virtual environments.

  1. Start by installing the virtualenv module on your machine via Terminal. To do this, use the command
    $ pip install virtualenv

    If you don’t have pip already installed, I would encourage you to install it using the following command:

    $ sudo easy_install pip

    It’ll definitely come in great aid later when you’re installing packages in all your new environments.

    Installing virtualenv

  2. Now that you have the virtualenv module installed on your machine, you can start setting up virtual environments. Yes, it’s that simple!To start off, it’s best practice that you setup your virtual machine within the folder of the project that you are working on. So, through the Terminal, navigate to the folder where your project is stored.Navigating to Directories via Terminal
  3. Once within the directory, create the virtual environment using the command
    $ virtualenv project

    Another best practice is to name your virtual environment the name of the project on which you are working. In my case, the name project is befitting, but make sure to select the right name for yourself.

    Note that if you have multiple instances of Python installed on your machine, you can also specify which instance you would like to be used in this particular virtual environment. We do not specify that in the command above, so the version used in this particular virtual environment will just be the default version on the machine.

  4. Now, before we start using our new environment, we need to configure it with modules specific to our project. To do so, activate the environment using the command
    $ source project/bin/activate

    Make sure to change the first word after “source” to the name of your own virtual environment. The name of your environment will be added in front of your command line once you’ve activated the environment.

    When your virtual environment is active, configure it as you wish. In this example, I install module elasticsearch in my project virtual environment:

    Creating Your Virtual Environment

  5. Once you complete configuring your environment, quit configuration by using the command
    $ deactivate
  6. To verify that your virtual environment is indeed different from your defaulted edition of Python, you can run a simple query testing for new modules in the defaulted Python version and then again in the virtual environment version. Note in my example that elasticsearch is present in my virtual environment, but not in my defaulted version of Python:Testing Virtual Environment Against Default Python Version
  7. Now that you have your virtual environment setup, you can run scripts specific to your project within that virtual environment without having to modify your defaulted version or affecting other projects you may be working on. Additionally, you can share your virtual environment with collaborating colleagues. To share your environment, while in your project’s directory within the Terminal, use the following command to create a requirements file:
    $  pip freeze > requirements.txt

    This will create a file in your project folder with all the modules present in your virtual environment.

    Creating the requirements.txt File

    When collaborating with other people, send them this requirements files, have them place it in their project directory, then navigate to that directory via their Terminal, and run the following command:

    $  pip install -r requirements.txt

    This will set up a matching virtual environment on their machine so that they can run your files without any issues.

For those seeking to learn more about virtual environments and all the things you can do with them, I highly encourage you to check out the full documentation for the module, which can be found here. A big thank you to my colleague who took time out of her day today to educate me about this.

Blog at WordPress.com.