Big Data and Hadoop: A Cloud Use Case

Tuesday night I drove to Santa Clara for the Big Data Camp, learned more about Hadoop, and even ran into a few Dell colleagues. Thanks to Dave Nielson for organizing the camp and to Ken Krugler for his great overview of Hadoop.

While the phrase big data lacks precision, it is of growing importance to ever more enterprises. With data flooding in from web sites, mobile devices, and those ever present sensors, processing and deriving business value out of it all becomes ever more difficult. Data becomes big when it exceeds the storage and processing capabilities of a single computer. While an enterprise could spend lots of money on high-end, specialized hardware finely tuned for a specific database, at some point that hardware will become either inadequate or too expensive.

Once data exceeds the capabilities of a single computer, we face the problem of distributing its storage and processing. To solve this problem, big data innovators are using Hadoop, an open source platform capable of scaling across thousands of nodes of commodity hardware. Hadoop includes both a distributed file system (HDFS) and a mechanism for distributing batch processes. It is scalable, reliable, fault tolerant, and simple in its design.

Many enterprises, however, would only need to process big data periodically, perhaps daily or weekly. When running a big data batch job, an organization might want to distribute the load over hundreds or thousands of nodes to assure completion within a time window. But when the batch job completes, those nodes may not be needed for Hadoop until the time comes for the next batch job.

This use case—the dynamic expansion and contraction in the need for computing resources—fits well with the capabilities of cloud computing, whether private, public, or hybrid. (In this case, I mean cloud as in Infrastructure-as-a-Service.) A cloud infrastructure could spin up the nodes when needed and free the resources when that need goes away.  In public cloud computing, that freeing of resources would directly impact cost.

During his presentation, Ken Krugler suggested that we think of Hadoop as a distributed operating system in that Hadoop enables many computers to store and process data as if they formed a single machine. I’d add that cloud computing—virtualization, automation, and processes that enable an agile infrastructure—may be needed to complete this operating system analogy so that this distributed operating system not only manages distributed resources but does so for maximum efficiency across a multitude of use cases.

Extra Credit Reading (Dell Resources on Hadoop):

Philippe Julio, Hadoop Architecture, http://www.slideshare.net/PhilippeJulio/hadoop-architecture

Joey Jablonski, Hadoop in the Enterprise, http://www.slideshare.net/jrjablo/hadoop-in-the-enterprise

Aurelian Dumitru, Hadoop Chief Architect, Dell’s Big Data Solutions including Hadoop Overview, http://www.youtube.com/watch?v=OTjX4FZ8u2s

Together with a few colleagues from Dell. I'm on the left. That's Aurelian Dumitru (Hadoop Chief Architect) in the center. And Barton George (Dell Cloud Evangelist) is on the right.

That’s me on the left, Aurelian Dumitru, Dell’s Hadoop Chief Architect, in the center, Barton George, Director of Marketing for Dell’s Web & Tech vertical, on the right. Thanks to DJ Cline for the photo. See http://bit.ly/iHJJIl for more photos of the Big Data Camp.

I'm the Director of Threat Solutions at Shape Security, a top 50 startup defending the world's leading websites and mobile apps against malicious automation. Request our 2017 Credential Spill Report at ShapeSecurity.com to get the big picture of the threats we all face. See my LinkedIn profile at http://www.linkedin.com/in/jamesdowney and follow me on Twitter at http://twitter.com/james_downey.

Posted in Cloud Computing, Hadoop
5 comments on “Big Data and Hadoop: A Cloud Use Case
  1. […] James Downey has written about this event! […]

  2. Erik Bansleben says:

    Mr. Downey,

    I found your blog and wanted to see if you might be interested in writing about a Certificate program at the University of Washington on the topic of Cloud Computing. It’s a program which is available both in the classroom and online and we’re trying to get the word out among key bloggers within this community.
    Would you be interested in participating in a conference call with other bloggers to learn more about the program? As program director I would participate along with some faculty and/or board members from the program so you could learn more about this program offering during the call. We can also provide you with some useful content that you might find helpful in writing a piece about the program. Lastly, if you have colleagues or fellow bloggers that you know about and whom you recommend we should include, please feel free to let us know.
    Thanks very much for considering this request.
    Erik

    Erik Bansleben, Ph.D.
    Program Development Director, Academic Programs
    UW Professional & Continuing Education
    ebansleben@pce.uw.edu
    206-221-6243

  3. […] Downey (@james_downey) posted Big Data and Hadoop: A Cloud Use Case on […]

  4. […] Downey, J. (2011, July 1). Big Data and Hadoop: A Coud Use Case. Retrieved from Jim Downey: https://jimdowney.net/2011/07/01/big-data-and-hadoop-a-cloud-use-case/ […]

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: