With years of experience working with cloud computing and distributed data architectures, Mauricio is passionate about creating value with technology. He is an industry-recognized leader in technical architecture for cloud-hosted data solutions.
This post will show architects and developers how to set up Hadoop to communicate with S3, use Hadoop commands directly against S3, use distcp to perform transfers between Hadoop and S3, and how distcp can be used to update on a regular basis based only on differences.
While on paper it should be a seamless transition to run Impala code in Hive, in reality it’s more like playing a relentless game of whack-a-mole. This post provides hints to make the transition easier.