Reading Avro files from HDFS

If you want to read Avro files from HDFS and you’re using schema – generated classes instead of GenericRecords, you’ll have to use the specific datum reader.

                    SeekableInput input = new FsInput(path, getConfiguration());
                    DatumReader<SpecificSchemaClass> reader = new SpecificDatumReader<SpecificSchemaClass>();
                    FileReader<SpecificSchemaClass> fileReader = DataFileReader.openReader(input, reader);
                    while (fileReader.hasNext()) {
                        SpecificSchemaClass event = fileReader.next();
                    }

So it’s basically as easy as reading the GenericRecords.

Don’t forget to add the dependencies if you’re using maven:

   <dependencies>
        <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro</artifactId>
            <version>1.7.5</version>
        </dependency>

        <dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro-tools</artifactId>
            <version>1.7.5</version>
        </dependency>

    </dependencies>
Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s