Author Archives: itellity

HBase and Hive with Avro Column size limits

Yesterday, we had the privilege of having to create Hive tables on top of a HBase table with avro columns. Piece of cake, we thought, since you just have to specify the schema location in Hive (since Hive 1.0, it’s … Continue reading

Posted in Uncategorized | Leave a comment

Scalding merge, concatenation and joins of pipes

I recently built a scalding job that ran everyday collection a set of ids with timestamps to determine the newest and oldest occurrence of a set, whilst merging that with previously aggregated set. A very simple task, involving simple mapping … Continue reading

Posted in Scala, Uncategorized | Tagged , , , | Leave a comment

ClassNotFoundException in Flume with HDFS Sink

If you ever encounter this error while setting up Flume with HDFS Sink: Just add the following JAR to your classpath in

Posted in Uncategorized | 1 Comment

Writing Avro records to HBase columns

I’ve been digging around recently to see how to store avro records in HBase without “exploding” values to single columns; this being a viable alternative since Hive 0.14 with it’s support for Avro queries in HBase columns. What you basically … Continue reading

Posted in Uncategorized | Leave a comment

Counters using Cascading Flow Listeners in Scalding

As of now, Scalding doesn’t provide full support for counters – you will find a few pull requests and the Stats class, nothing more. This will probably change in the future, until then, I found using Cascading FlowListeners for counters … Continue reading

Posted in Uncategorized | Tagged , | Leave a comment

If you’re wondering why your iPhone’s battery is empty again

Look no further, ditch Facebook. In terms of battery usage, it’s the worst hog ever: Data usage is quite massive too: Once you terminate the app and switch to using safari for facebook, you’ll notice the difference.

Posted in Uncategorized | Leave a comment

How to debug a hadoop Job with eclipse (or any other IDE)

Before we get started – just a quick note: this will only work for as long as your hob haven’t been submitted to a cluster or as long as your jobs run locally. This is basically just the right thing … Continue reading

Posted in Uncategorized | Leave a comment