How to debug a hadoop Job with eclipse (or any other IDE)

Before we get started – just a quick note: this will only work for as long as your hob haven’t been submitted to a cluster or as long as your jobs run locally.

This is basically just the right thing to do if you want to debug configuration parameters or other set up relevant processes. I used this to debug a CLI call to a scalding job, for instance.

1) The first thing you need to do is to add the remote debugging facility to hadoop:


export HADOOP_OPTS="$HADOOP_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,address=8999"

This need to be added to your conf/hadoop-env.sh or exported to your env variables.

2) Now eclipse:

Choose Run -> DebugConfigurations -> Remote Java Application

and add port 8999 to your connection settings. That’s what it should look like:

Screen Shot 2014-09-30 at 17.34.44

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s