Category Archives: Uncategorized
Yesterday, we had the privilege of having to create Hive tables on top of a HBase table with avro columns. Piece of cake, we thought, since you just have to specify the schema location in Hive (since Hive 1.0, it’s … Continue reading
I recently built a scalding job that ran everyday collection a set of ids with timestamps to determine the newest and oldest occurrence of a set, whilst merging that with previously aggregated set. A very simple task, involving simple mapping … Continue reading
If you ever encounter this error while setting up Flume with HDFS Sink: Just add the following JAR to your classpath in flume-env.sh:
I’ve been digging around recently to see how to store avro records in HBase without “exploding” values to single columns; this being a viable alternative since Hive 0.14 with it’s support for Avro queries in HBase columns. What you basically … Continue reading
As of now, Scalding doesn’t provide full support for counters – you will find a few pull requests and the Stats class, nothing more. This will probably change in the future, until then, I found using Cascading FlowListeners for counters … Continue reading
Look no further, ditch Facebook. In terms of battery usage, it’s the worst hog ever: Data usage is quite massive too: Once you terminate the app and switch to using safari for facebook, you’ll notice the difference.
Before we get started – just a quick note: this will only work for as long as your hob haven’t been submitted to a cluster or as long as your jobs run locally. This is basically just the right thing … Continue reading