How to Troubleshoot Crontab Issues

The other day, I updated the operating system on the server on which my crontab was installed. I thought that this would be a harmless upgrade to the system — nothing too crazy. However, I was wrong, the crontab that was working perfectly well on the old operating system now seemingly ceased to work. Nothing had changed in my setup of the scripts or the crontab, so why was it not working?

I set about troubleshooting the issue with some guidance from this article:

Disclaimer: After all my troubleshooting, I ultimately did not resolve the core of my issue and instead decided to give up on crontab and use a different solution. So, if you’re looking for a solution to my crontab problems at the end of this article, you will not find it.

  1. Verify my crontab settings: First thing I did was open up my crontab to make sure that everything was configured properly (and as I’d left it). I did this using the command…
    env EDITOR=nano crontab -e

    And sure enough the setup of my crontab was as I had left it:

    0 * * * * cd ~/Desktop/how\ to\ troubleshoot\ crontab/ && python2.7 hello_world.py

    While I was here, I thought to update the “0” in my job to “*” so that the job will run regularly every minute. This will make it easier to troubleshoot what was going on. I then closed and saved the crontab.

  2. Make sure the command can run successfully from Terminal: My second thought, after seeing that everything was all right with the file, was attempting to run the full command from my crontab file in Terminal and seeing if it worked:



    It did!
  3. Attempt to install crontab with admin permissions: Once I verified that the command could run outside of crontab without an issue, I started considering that maybe I didn’t have proper permissions to install the crontab on the server. So, I re-opened the crontab file with sudo privileges:sudo env EDITOR=nano crontab -e

    This is what I saw:

    What happened to my crontab command? Well, just to be sure, I went ahead and entered the same command into this crontab file that I had opened with the sudo permissions and checked if that produced the result I wanted.  It did not.

  4. Make sure the application has proper permissions: When I closed my last crontab edit (with sudo permissions), this little dialog box showed up:

    So, I speculated that perhaps iTerm, the Terminal application I was using to edit and install the crontab did not have proper permissions to install the crontab file on my server. I opened up my Mac’s “System Preferences”, accessible from the Apple icon in the upper left corner of the screen:

    Navigated to “Security & Privacy” tab:

    Scrolled to the “Full Disk Access” portion and unlocked the screen:

    Then added the iTerm application to the list of applications with “Full Disk Access” using the “+” button under the list of applications to the right:

    With iTerm now having “Full Disk Access”, I re-attempted to save the crontab (using sudo permissions), but had no success.

  5. Brute-force a crontab into its expected placement: So, I turned to the internet and found the article listed at the top that seemed pretty helpful. It stated, “There is a system-wide /etc/crontab file…”

    I neglected to read the rest of the sentence and instead went searching for the existence of the system-wide crontab file in the specified /etc/ directory.

    This is a directory that is otherwise hidden, so I used the “Spotlight” search feature on Mac (accessed via Cmd + Space) to search for the /etc/ directory and found it:

    And sure enough, once I opened that directory, I noticed that it had no file called “crontab” in it.

    In the absence of a “crontab” file, I decided I’d add one myself in a rather unorthodox manner. I opened a new file in Sublime Text, wrote my crontab command into the file, saved it to my “Desktop”, and then dragged it from that directory into the “etc” directory. What could go wrong?

    And yet, despite all this, nothing seemed to change! My crontab job was still failing to run.

     

  6. Check crontab run logs: So, I scanned the web article for some other ideas on what may be going on and spotted a recommendation to run the command…
    ps -ef | grep cron | grep -v grep

    …to check if my crontab has run. And sure enough it had! Multiple times even!

    To better understand this output, I looked up how to get the column headers for the grep command and found this helpful writeup. I went ahead and amended my command to the following:

    ps -ef | head -1 && ps -ef | grep cron | grep -v grep

    Now that it seemed that something was running, I did some accounting: Up to this point, it’s possible that I may have set up three different crontab files. One was set up via crontab using my own user permissions, another was set up via crontab using sudo permissions, and the last one was brute-forced into the “/etc/” directory using a text file.

    Of those three, it seemed that one or two were running. I suspected that the command labeled “(cron)” was a result of my sudo permissions manipulation and that the command labeled “/usr/sbin/cron” was coming from the one setup with my standard user permissions. As to why two instances existed for both, I was not sure. So, in an effort to verify this, I deleted my brute-forced crontab file from the “/etc/” directory and waited a minute to see if the logs looked any different.

    The logs did not look any different. Thus, my next step was to delete the crontab I added using sudo permissions. To do this, I opened the crontab file again using sudo permissions, deleted its contents, and saved. Again I waited a minute and re-ran the logs to see if anything had changed:

    This time around, I noted that duplicate commands have been removed. This let me know a few things: a) the crontab I had originally configured using my standard user permissions was running and b) it was generating two commands. I suspected two commands were showing because I was concatenating two commands in my crontab: one command to change directories and another to run a Python script.
     

  7. Split the command and leverage logs: I was more befuddled than ever at this point. It seems that all my poking around had been for nothing. My crontab was installed and running successfully. So maybe it was something in my crontab that wasn’t working well.I decided to employ logs in order to troubleshoot further. I broke up my command into different components and recorded output messages into separate logs in order to see where an error may have occurred:
    * * * * * echo 'success' >> /tmp/log1.log
    * * * * * cd ~/Desktop/how\ to\ troubleshoot\ crontab/ && echo 'success' >> /tmp/log2.log
    * * * * * cd ~/Desktop/how\ to\ troubleshoot\ crontab/ && python2.7 hello_world.py >> /tmp/log3.log
    * * * * * cd ~/Desktop/how\ to\ troubleshoot\ crontab/ && python2.7 hello_world.py && echo 'success' >> /tmp/log4.log

    First thing I noted when running the new crontab is that each command line in the file generated two commands in the crontab run logs:

    So my previous assumption that the number of commands in the log corresponded to the number of commands in a line was not correct.

    Second, I noted that while I expected 4 logs to generate, only 3 did:

    And, unfortunately, they were all empty. This was not what I expected at all since when I ran the third command line from my crontab file in Terminal…

    cd ~/Desktop/how\ to\ troubleshoot\ crontab/ && python2.7 hello_world.py >> /tmp/log3.log

    …it generated a log file with a value in it:


    I was starting to suspect that crontab was having trouble running my Python script.

  8. Try running an executable file: Better informed about what was happening within my crontab, but somewhat exasperated by my situation, I thought to convert my Python script file to an executable file. I thought that an executable file may be easier for crontab to process since it would remove the dependency on the “python2.7” command. Unfortunately, even this failed!
  9. Use a different solution: So, feeling completely out of options, I decided to give up on crontab and try something different altogether: Automator.

I apologize for the disappointing conclusion. Nonetheless, I think the troubleshooting process for my issue is helpful in understanding how crontab works and how to check that crontab is running. The main takeaways:

  • When troubleshooting with crontab, update the command to run every minute to provide a quicker feedback loop.
  • Verify the format of your crontab.
  • Attempt the commands from the crontab file in Terminal to make sure they work on their own.
  • Leverage crontab logs to check that crontab is installed and running correctly.
  • Add output logs to your commands in the crontab file to better understand where failure is occurring.

Pentaho Kettle MySQL Connection Errors

It’s been a minute.

I’ve been working on a number of personal projects recently and have not taken the time to document my development. I’ve learned a ton about everything from JavaScript, to ETL software Pentaho Kettle, to MySQL and Python and feel like now might be a good time to recap some of the issues I ran into during my development and also the solutions I found/devised as workarounds. More than anything, I hope this helps others that run into similar issues (and possibly also help me if by accident I run into these issues again).

The first issue I would like to recount is trouble in connecting to my GoDaddy MySQL server via Pentaho Kettle. For the longest time, this was an absolute struggle for me with no seeming explanation in site. I had set up my MySQL server on my hosting service, GoDaddy, without any issue and had been able to connect successfully via HeidiSQL. However, when I was attempting to set up my MySQL connection in Pentaho Kettle, I kept getting this absolutely horrendous error message:


Error connecting to database [devdb2] : org.pentaho.di.core.exception.KettleDatabaseException: 
Error occured while trying to connect to the database

Exception while loading class
org.gjt.mm.mysql.Driver


org.pentaho.di.core.exception.KettleDatabaseException: 
Error occured while trying to connect to the database

Exception while loading class
org.gjt.mm.mysql.Driver


    at org.pentaho.di.core.database.Database.normalConnect(Database.java:368)
    at org.pentaho.di.core.database.Database.connect(Database.java:317)
    at org.pentaho.di.core.database.Database.connect(Database.java:279)
    at org.pentaho.di.core.database.Database.connect(Database.java:269)
    at org.pentaho.di.core.database.DatabaseFactory.getConnectionTestReport(DatabaseFactory.java:86)
    at org.pentaho.di.core.database.DatabaseMeta.testConnection(DatabaseMeta.java:2464)
    at org.pentaho.ui.database.event.DataHandler.testDatabaseConnection(DataHandler.java:533)
    at sun.reflect.GeneratedMethodAccessor55.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.pentaho.ui.xul.impl.AbstractXulDomContainer.invoke(AbstractXulDomContainer.java:329)
    at org.pentaho.ui.xul.impl.AbstractXulComponent.invoke(AbstractXulComponent.java:139)
    at org.pentaho.ui.xul.impl.AbstractXulComponent.invoke(AbstractXulComponent.java:123)
    at org.pentaho.ui.xul.swt.tags.SwtButton.access$500(SwtButton.java:26)
    at org.pentaho.ui.xul.swt.tags.SwtButton$4.widgetSelected(SwtButton.java:119)
    at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
    at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
    at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
    at org.eclipse.jface.window.Window.runEventLoop(Window.java:820)
    at org.eclipse.jface.window.Window.open(Window.java:796)
    at org.pentaho.ui.xul.swt.tags.SwtDialog.show(SwtDialog.java:378)
    at org.pentaho.ui.xul.swt.tags.SwtDialog.show(SwtDialog.java:304)
    at org.pentaho.di.ui.core.database.dialog.XulDatabaseDialog.open(XulDatabaseDialog.java:115)
    at org.pentaho.di.ui.core.database.dialog.DatabaseDialog.open(DatabaseDialog.java:62)
    at org.pentaho.di.ui.spoon.delegates.SpoonDBDelegate.newConnection(SpoonDBDelegate.java:493)
    at org.pentaho.di.ui.spoon.delegates.SpoonDBDelegate.newConnection(SpoonDBDelegate.java:478)
    at org.pentaho.di.ui.spoon.Spoon.newConnection(Spoon.java:7770)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.pentaho.ui.xul.impl.AbstractXulDomContainer.invoke(AbstractXulDomContainer.java:329)
    at org.pentaho.ui.xul.impl.AbstractXulComponent.invoke(AbstractXulComponent.java:139)
    at org.pentaho.ui.xul.impl.AbstractXulComponent.invoke(AbstractXulComponent.java:123)
    at org.pentaho.ui.xul.swt.tags.SwtMenuitem.access$100(SwtMenuitem.java:27)
    at org.pentaho.ui.xul.swt.tags.SwtMenuitem$1.widgetSelected(SwtMenuitem.java:77)
    at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
    at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
    at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
    at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1183)
    at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:6966)
    at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:567)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.pentaho.commons.launcher.Launcher.main(Launcher.java:134)
Caused by: org.pentaho.di.core.exception.KettleDatabaseException: 
Exception while loading class
org.gjt.mm.mysql.Driver

    at org.pentaho.di.core.database.Database.connectUsingClass(Database.java:423)
    at org.pentaho.di.core.database.Database.normalConnect(Database.java:352)
    ... 50 more
Caused by: java.lang.ClassNotFoundException: org.gjt.mm.mysql.Driver
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Unknown Source)
    at org.pentaho.di.core.database.Database.connectUsingClass(Database.java:414)
    ... 51 more

I still shudder looking at this.

Fortunately, the solution to this problem is not a complicated one. I found the answer on this post from StackOverflow. The reason the connection is failing is due to a missing MySQL Java connector library in Pentaho Kettle. This is how I solved the issue on my computer:

  1. I downloaded the most recent version of the MySQL Java Connector. (I needed to create an Oracle MySQL account in order to download this driver. To get the .zip format, I selected the Platform Independent option from the drop-down menu.)
  2. Extract the files from the downloaded .zip file and find the one labeled mysql-connector-java-5.1.34-bin.jar.
  3. Place this file into your Pentaho Kettle lib folder. The address for that should be something like C:\Program Files\pentaho\kettle\data-integration\lib.
  4. Restart Pentaho Kettle.

Troubleshooting SQL-Based Software

The company where I currently work produces a software solution that relies on SQL databases to store its information.  A lot of the issues I encounter on a daily basis have to do with performance: how come a certain module is taking so long to open or why is a certain procedure running long?

Issues with performance are exciting.  In a way, the process of finding optimal performance is a never-ending hunt.  Performance can be improved in a multitude of ways: by enhancing the manner in which the software’s graphic user interface interacts with the underlying tables, changing the indexes used on certain tables, rewriting the queries used to access information, etc.

The possibilities are endless!

But I’m getting ahead of myself.  With this post, I hope to focus on the method I use to determine the source of a performance issue.  Thus, I’ll be walking you through some of the steps I take when troubleshooting such issues.  I hope that these examples can be of use to you.

The Issue

Fictional e-mail:

Hi Moran,

 

I was trying to open a certain module in the application today and noticed that the process was running a bit longer than usual.  Do you know what could be the cause?

 

Thanks,

Boss

So, as copied above, I received a fictional e-mail from my manager this morning.  It seems that he’s having trouble opening a certain module in the application.  The module is able to open, but it takes a long time to do so–longer than it used to take.

Given that I am familiar with the make-up of this application, I know several things and can begin my troubleshooting there.  These are my assumptions:

  1. The software is built on SQL Server databases.
  2. Information within the software is accessed using queries that collect information from the SQL Server databases.
  3. This is not an issue with the configuration of the software given that opening this module is a routine operation.

With this in mind, I formulate my plan of attack.  I decide on the following:

  1. Recreate the issue on my local machine.
  2. While the issue is occurring, conduct a review of the SQL processes.
  3. Dig deeper, as necessary.

Step 1: Recreating the Issue

Given that my manager and I are able to connect to the same servers and access the same user account, I am able to easily recreate the issue on my machine.

In other instances, this might be harder to do. You may need to screen share with the user and watch as they recreate the issue or, in the case that the issue is very unique, have them provide you with a series of detailed steps on how they ran into the issue.

For those short on writing, I recommend using Screencast-o-Matic for quick and easy recording of your desktop. No installation necessary!

Step 2: Initial Backend Review

Once the issue has been recreated, the first thing I like to do is run the following command in SQL Server Management Studio:

sp_who2 active

You can run this command on any database.  The output provides you with a record of all active SPIDs on your server.  When running this, I’m interested in finding which SPID and process is associated with the issue I’m witnessing and whether there are any blocks in the system (represented by SPID values in the “BlkBy” column).  This is my output from running the command:

sp_who2 active with blk

The first thing I notice is that there are no blocks occurring in my system.

Next, I try to discern which activities are of interest to me.  I identify them by the information reported in the HostName, DBName, and LastBatch columns.  I am looking for SPIDs initiated from my machine (MNACHUM), in the “AdventureWorks” database, and that were launched about five minutes ago.  SPID 56 is the one I identify as corresponding to my issue.  SPID 55 comes close, but is actually connected to the sp_who2 command I just launched in SSMS; I am able to discern this fact by noting that the last batch for that SPID was executed just seconds ago.

As reported in my query output, SPID 56 is SUSPENDED.  This might be the reason why my manager’s application is taking a long time to open the module.  I do some more digging by running the following diagnostic command:

DBCC INPUTBUFFER(56)

The value in the parentheses references the SPID for the activity of interest.  The command, when run, returns information about the last command sent out by that particular activity.  This is my output:

dbcc inputbuffer

 

From this, I can discern that the issue is occurring with a particular stored procedure, dbo.sp_run_a_while.  However, this might be a long procedure with many moving parts.  To troubleshoot the exact cause of slow performance within the stored procedure, I’ll need to dig deeper.

Step 3: Digging Deeper

So, I’ve determined that the slowness witnessed by my manager is due to a particular stored procedure.  But what part of it?

If I am able to open the stored procedure, I could review it and look for a command similar to the one associated with the activity SPID reported by my sp_who2 command (if you go back and review the screenshot from earlier, you’ll notice that the associated command for SPID 56 is WAITFOR).

However, I don’t consider this method particularly efficient, since it requires scouring through the code of the stored procedure.  At times, these procedures can be thousands of lines and have many similar commands.

A better procedure to follow is to do one of two things:

  1. Query the sys.dm_exec_requests table with a CROSS APPLY on the sys.dm_exec_sql_text table to determine the last command that was run by the stored procedure.
  2. Set up and run a trace on the stored procedure to determine where it is getting hung up.

For the first option, you can use the following code I developed:

SELECT sql_handle,
statement_start_offset,
statement_end_offset,
plan_handle,
total_elapsed_time,
query_hash,
text
FROM sys.dm_exec_requests
CROSS APPLY sys.dm_exec_sql_text(sql_handle)

The last column (text) of the output specifies the last command initiated by any of the active SPIDs on your server.  Once again, be discerning in recognizing which of the rows actually corresponds to your particular SPID.  In my case, it is the second row:

cross apply

I can expand the column or copy and paste the text into the editor to review it further.

Alternatively, I can set up a trace to see where the procedure gets hung up.  To do so,

  1. From the top menu options, navigate to Tools –> SQL Server Profiler.
  2. In the new window, connect to your server.
  3. Open a new trace and configure it with the following Events under the Event Selection tab (make sure to check the “Show All Events” and “Show All Columns” options in the bottom right-hand corner to access these events):
    1. Maintain the default settings
    2. Under the Stored Procedure heading, select “SP: Completed”, “SP: Starting”, “SP: StmtCompleted”, “SP: StatementStarting”
    3. Under the TSQL heading, select “SQL: StmtCompleted” and “SQL: StmtStarting”
  4. Open the Column Filters window by selecting the option in the bottom right-hand corner and filter on the following items:
    1. ApplicationName: should be like the one for your SPID (i.e. MNACHUM)
    2. DatabaseName: should be like the one for your SPID (i.e. AdventureWorks)
    3. Duration: helpful for issues with slow performance, set to “Greater Than 4000”
    4. ObjectName: should be like your stored procedure (i.e. sp_run_a_while)
  5. Start the trace.

In order to capture the long-running event, you will need to recreate the issue again while the trace is running.

When I did so in my environment, I found that the command within the stored procedure causing my manager slow performance was

WAITFOR DELAY '15:00';

Now that I know what’s causing the slowness I can communicate with the writers of the stored procedure and begin discussions as to whether the command is necessary and what can be done to improve its efficiency.

(Obviously, a WAITFOR command by nature is inefficient in this kind of setting.  Maybe we’ll explore the troubleshooting of more complex queries in a different post.)

A big thank you to my colleague, Maxwell, who trained me and guided me on a lot of these troubleshooting techniques.

Create a free website or blog at WordPress.com.