Hadoop Blueprints

Anurag Shrivastava · Tanmay Deshpande

Sep 2016 · Packt Publishing Ltd

Ebook

316

Pages

Ratings and reviews aren’t verified Learn More

About this ebook

Use Hadoop to solve business problems by learning from a rich set of real-life case studiesAbout This BookSolve real-world business problems using Hadoop and other Big Data technologiesBuild efficient data lakes in Hadoop, and develop systems for various business cases like improving marketing campaigns, fraud detection, and morePower packed with six case studies to get you going with Hadoop for Business IntelligenceWho This Book Is For

If you are interested in building efficient business solutions using Hadoop, this is the book for you This book assumes that you have basic knowledge of Hadoop, Java, and any scripting language.

What You Will LearnLearn about the evolution of Hadoop as the big data platformUnderstand the basics of Hadoop architectureBuild a 360 degree view of your customer using Sqoop and HiveBuild and run classification models on Hadoop using BigMLUse Spark and Hadoop to build a fraud detection systemDevelop a churn detection system using Java and MapReduceBuild an IoT-based data collection and visualization systemGet to grips with building a Hadoop-based Data Lake for large enterprisesLearn about the coexistence of NoSQL and In-Memory databases in the Hadoop ecosystemIn Detail

If you have a basic understanding of Hadoop and want to put your knowledge to use to build fantastic Big Data solutions for business, then this book is for you. Build six real-life, end-to-end solutions using the tools in the Hadoop ecosystem, and take your knowledge of Hadoop to the next level.

Start off by understanding various business problems which can be solved using Hadoop. You will also get acquainted with the common architectural patterns which are used to build Hadoop-based solutions. Build a 360-degree view of the customer by working with different types of data, and build an efficient fraud detection system for a financial institution. You will also develop a system in Hadoop to improve the effectiveness of marketing campaigns. Build a churn detection system for a telecom company, develop an Internet of Things (IoT) system to monitor the environment in a factory, and build a data lake – all making use of the concepts and techniques mentioned in this book.

The book covers other technologies and frameworks like Apache Spark, Hive, Sqoop, and more, and how they can be used in conjunction with Hadoop. You will be able to try out the solutions explained in the book and use the knowledge gained to extend them further in your own problem space.

Style and approach

This is an example-driven book where each chapter covers a single business problem and describes its solution by explaining the structure of a dataset and tools required to process it. Every project is demonstrated with a step-by-step approach, and explained in a very easy-to-understand manner.

About the author

Anurag Shrivastava is an entrepreneur, blogger, and manager living in Almere near Amsterdam in the Netherlands. He started his IT journey by writing a small poker program on a mainframe computer 30 years back, and he fell in love with software technology. In his 24-year career in IT, he has worked for companies of various sizes, ranging from Internet start-ups to large system integrators in Europe. Anurag kick-started the Agile software movement in North India when he set up the Indian business unit for the Dutch software consulting company Xebia. He led the growth of Xebia India as the managing director of the company for over 6 years and made the company a well-known name in the Agile consulting space in India. He also started the Agile NCR Conference, which has become a heavily visited annual event on Agile best practices, in the New Delhi Capital Region. Anurag became active in the big data space when he joined ING Bank in Amsterdam as the manager of the customer intelligence department, where he set up their first Hadoop cluster and implemented several transformative technologies, such as Netezza and R, in his department. He is now active in the payment technology and APIs, using technologies such as Node.js and MongoDB. Anurag loves to cycle on the reclaimed island of Flevoland in the Netherlands. He also likes listening to Hindi film music.

Tanmay Deshpande is a Hadoop and big data evangelist. He's interested in a wide range of technologies, such as Apache Spark, Hadoop, Hive, Pig, NoSQL databases, Mahout, Sqoop, Java, and cloud computing. He has vast experience in application development in various domains, such as finance, telecoms, manufacturing, security, and retail. He enjoys solving machine learning problems and spends his time reading anything he can get his hands on. He has a great interest in open source technologies and promotes them through his lectures. He has been invited to various computer science colleges to conduct brainstorming sessions with students on the latest technologies. Through his innovative thinking and dynamic leadership, he has successfully completed various projects. Tanmay is currently working with Schlumberger as the lead big data developer. Before Schlumberger, Tanmay worked with Lumiata, Symantec, and Infosys. Tanmay is the author of books such as Hadoop Real World Solutions Cookbook-Second Edition, DynamoDB Cookbook, and Mastering DynamoDB, all by Packt Publishing.

Rate this ebook

Tell us what you think.

Reading information

Smartphones and tablets

Install the Google Play Books app for Android and iPad/iPhone. It syncs automatically with your account and allows you to read online or offline wherever you are.

Laptops and computers

You can listen to audiobooks purchased on Google Play using your computer's web browser.

eReaders and other devices

To read on e-ink devices like Kobo eReaders, you'll need to download a file and transfer it to your device. Follow the detailed Help Center instructions to transfer the files to supported eReaders.

Report illegal content