Skip to content

Posts tagged ‘Hadoop’

Using Search Analytics To See Into Gartner’s $232B Big Data Forecast

By combining search analytics and the latest Gartner forecast on big data published last week, it’s possible to get a glimpse into this areas’ highest growth industry sectors.  Big data is consistently a leading search term on Gartner.com, which is the basis of the twelve months of data used for the analysis.

In addition, data from Gartner’s latest report, Big Data Drives Rapid Changes in Infrastructure and $232 Billion in IT Spending Through 2016 by Mark A. Beyer, John-David Lovelock, Dan Sommer, and Merv Adrian is also used.  These authors have done a great job of explaining how big data is rapidly emerging as a market force, not just a single market unto itself.  This distinction pervades their analysis and the following table showing Total IT Spending Driven by Big Data reflects the composite market approach.  Use cases from enterprise software spending, storage management, IT services, social media and search forecasts are the basis of the Enterprise Software Spending for Specified Sub-Markets Forecast.  Social Media Analytics are the basis of the Social Media Revenue Worldwide forecast.

Additional Take-Aways

  • Enterprise software spending for specified sub-markets will attain a 16.65% compound annual growth rate (CAGR) in revenue from 2011 to 2016.
  • Attaining a 96.77% CAGR from 2011 through 2016, Social Media Revenue Is one of the primary use case catalysts of this latest forecast.
  • Big Data IT Services Spending will attain a 10.20% CAGR from 2011 to 2016.
  • $29B will be spent on big data throughout 2012 by IT departments.  Of this figure, $5.5B will be for software sales and the balance for IT services.
  • Gartner is projecting a 45% per year average growth rate for social media, social network analysis and content analysis from 2011 to 2016.
  • Gartner projects a 20 times ratio of IT Services to Software in the short term, dropping as this market matures and more expertise is available.
  • By 2020, big data functionality will be part of the baseline of enterprise software, with enterprise vendors enhancing the value of their applications with it.
  • Organizations are already replacing early implementations of big data solutions – and Gartner is projecting this will continue through 2020.
  • By 2016 spending on Application Infrastructure and Middleware becomes one of the most dominant for big data in Enterprise Software-Specified Sub Markets.

  • $232B is projected to be sold in total across all categories in the forecast from 2011 to 2016. From $24.4B in 2011 to $43.7B in 2016, this presents a 12.42% CAGR in total market growth.

Search Analytics and Big Data

Big data is continually one of the top terms search on Gartner.com, and over the last twelve months, this trend has accelerated.  The following time series graph shows the weekly number of inquiries Gartner clients have made, with the red line being the logarithmic trend.

Banking (25%), Services (15%) and Manufacturing (15%) are the three most active industries in making inquiries about big data to Gartner over the last twelve months.  The majority of these are large organizations (63%) located in North America (59%) and Europe (19%).

What unifies all of these industries from a big data standpoint is how critical the stability of their customer relationships are to their business models.  Banks have become famous for bad service and according to the American Customer Satisfaction Index (ACSI) have shown anemic growth in customer satisfaction in the latest period measured, 2010 to 2011.  The potential for using big data to becoming more attuned to customer expectations and deliver more effective customer experiences in this and all services industries shows great upside.

Bottom line: Companies struggling with flat or dropping rankings on the ACSI need to consider big data strategies based on structured and unstructured customer data.  In adopting this strategy the potential exists to drastically improve customer satisfaction, loyalty, and ultimately profits.

Gartner Search Analytics Shows Spike in Hadoop Inquiries in 2012 – Good News For CRM

Hadoop was one of the most-searched terms on Gartner’s website in 2011 through 2012, spiking to 601.8% over the last twelve months alone.  Additional insights from the Search Analytics on Hadoop include the following:

  • 27% of all inquiries are from banking, finance and insurance industries, followed by manufacturing (14%), government (13%), services (10%) and healthcare (8%).
  • North America (75.9%) and EMEA (13.5%) are the two most dominant geographies in terms of query volume.
  • Here is the trend line from Gartner Search Analytics:

What’s driving Hadoop’s meteoric rise in searches is a combination of industry hype about big data, CIOs getting serious about using Hadoop distributions that minimize time and risk yet deliver value, and the dominant role Amazon is playing in bringing Hadoop into the cloud.  Today Amazon offers Elastic MapReduce as a Web Service that relies on a hosted Hadoop framework running the Elastic Compute Cloud (EC2) in conjunction with Amazon Simple Storage Service (S2).

Microsoft also scored a major hiring win this week announcing that Raghu Ramakrishnan, former chief scientist for three divisions of Yahoo is now with Microsoft. Raghu is now a technical fellow working in the Server and Tools Business (STB).  He’ll focus on big data and integration to STB platforms.  Big Data on Azure will accelerate now with him on-board.

Hadoop’s Potentially Galvanizing Effect on CRM and Social CRM Analytics

The quickening pace of Hadoop adoption in the enterprise is good news for CRM and especially social CRM. Analytics and Business Intelligence (BI) are the “glue” that unify CRM and keep it in context. One of Hadoop’s greatest potential contributions is the analysis, categorization and use of unstructured content.  Marketing and sales won’t have to run three or four systems to gain insights into customer data, they can run a single analytics platform that fuels the entire selling cycle and lifetime customer value chain of their businesses.  Hadoop has the potential to make unstructured content more meaningful while also reporting the impact of customer insights on financial performance, profitability and lifetime customer value.

Translating terabytes of customer, sales, services and partner data into meaningful analytics and business intelligence (BI) is emerging as a priority for CIOs, who are sharing responsibility for driving top-line revenue growth.   Hadoop shows potential to be the “glue” or galvanizing technology base that unifies all CRM and Social CRM strategies.

To get a perspective on how fast Hadoop is being evaluated and adopted it’s useful to look at the Hype Cycle for Data Management, the latest edition published July, 2011.   This is another indicator of how quickly Hadoop and big data are gaining in terms of CIO mindshare.  Big Data and extreme information management are on the technology Trigger area of the hype cycle.  The Hype Cycle for Data Management is shown below:

Bottom line:  CRM and Social CRM will benefit more than any other area of an enterprise as Hadoop’s adoption continues to accelerate.  CIOs are increasingly called upon to be strategists, and with the ability to translate terabytes of data into strategies that deliver dollars, look for Hadoop’s contributions to drive top-line revenue growth.

Data Without Limits – Insights from Werner Vogels of Amazon.com

O’Reilly Media’s Strata, Making Data Work Conference held February 1rst – 3rd, 2011 in Santa Clara, California was one of the most interesting and multifaceted events of the year.  Included were presentations on data science, real-time data processing and analytics, data acquisition and crowdsourcing, visualization, in addition to many other topics.  You can find the complete list of speaker slides and videos for the event at this link, Strata 2011 Speaker Slides & Videos.

What enriches this conference is the quality of the case studies presented.  Be sure to check out the presentation from DJ Patil of LinkedIn on Innovating Data Teams.  His discussion illustrates just how critical big data is to LinkedIn and how their approach to managing it enriches the user experience, and is transforming LinkedIn functionality at the same time.

One of the best overall presentations features Dr. Werner Vogels, CTO of Amazon.com titled Data Without Limits.  The video is provided below and provides a glimpse into how pervasive AWS is becoming as a foundation for accessing, aggregating and transforming data in real time.

Data Science Shows Potential To Redefine Cloud-based Analytics

The emerging field of data science is a fascinating one that has major implications on the potential of cloud-based analytics, CRM, search, supply chain management and logistics.

Instead of relying purely on latent semantic indexing or the Google PageRank algorithm to define relevance of a search, data science techniques analyze content and its context to determine relevance.  Google today looks at the content of a page; data science considers its surrounding data and relevance.

Earlier this month TechCrunch published the blog post Marissa Mayer’s Next Big Thing: “Contextual Discovery” — Google Results Without Search.  The techniques of contextual discovery Google is experimenting with rely on a very rapid aggregation and transforming of data, which are part of the methodologies of data science.   When Google moves fully into contextual discovery the potential exists for cloud-based analytics, CRM, search, supply chain management and logistics to be completely revolutionized by solving the big data problems associated with each of these areas.

In CRM, this would mean finally being able to access external and internal content (including the massive amount of data on social networks), aggregate the data, and transform it into meaningful analysis.  The vision of social CRM would be realized once data science serves as the catalyst of contextual search or as Google calls it, contextual discovery.

Exploring Data Science

Two of the best blog posts are both from O’Reilly Radar on the emerging topic of data science.  What is data science? By Mike Loukides and Six months after “What is data science?” by Mac Slocum O’Reilly Radar are worth reading and giving some serious thought to.  O’Reilly also has also created a free report titled What is Data Science, which can be downloaded here.

Authors Mike Loukides and Mac Slocum set the foundation for how transformational data science has the potential of being by concentrating on the nascent area of data products.  A data product is the result of accessing, aggregating and transforming content regardless of its location – and capturing data on its attributes – not just the data itself. Both authors point to reference systems and guided reference engines on e-commerce sites as just the beginning.  Yet after reading their assessments and listening to Roger Magoulas, O’Reilly’s Director of Research, interviewed about data science below there are many more potential uses of this evolving area.

Potential Impact of Data Science on Analytics

The blog posts by Mike Loukides and Mac Slocum go into detail explaining how each area of data science is in varying levels of maturity.  After reading these over and considering the big data problems in cloud-based analytics, CRM, search, supply chain management and logistics, the following methodology starts to make sense:

Access – For data science to realize its full potential there needs to be a technology layer that provides for real-time access to structured and unstructured content both within and outside an enterprise.  More than a traditional Enterprise Application Integration (EAI) layer the technologies driving data access need to selectively pull all available content from every unstructured and structured data source available.  Mike Loukides mentions Google Goggles and how MapReduce has made this application possible.  Hadoop as a means to create greater access across federated content has much potential in this phase as well.

Aggregate – Called data conditioning by Mike Loukides, the aggregation phase is where contextual discovery happens.  This could be accomplished through contextual search filters, taxonomies defined by specific alerts, or the use of the MapReduce and Hadoop query and relevance tools in use today.

Transform – Where Hadoop could be used for driving data analysis and as Mike Loukides calls this level of analysis, data jiujitsu.   Examples are mentioned by both Mike Loukides and Mac Slocum including the Hadoop Online Prototype (HOP), which does real-time stream processing and several others.  The impact of the access, aggregate and transform methodology on visualization is available at Flowing Data, one of the best sites on the Web for seeing how MapReduce, Hadoop and other data science-related techniques are taking on massive amounts of data and delivering insights.

Conclusion

Solving the big data problems of social media monitoring, sentiment analysis, forming a scalable platform for social CRM, integrating CRM, supply chain management and logistics data to demand management – and tying all of these areas to financial performance – is potentially achievable with data science.  Deployed as a cloud-based platform opens up even greater potential for getting the most use of social networks, free data sources, and third-party databases than is possible today.

Be sure to check out the video below of Roger Magoulas, O’Reilly’s Director of Research, where he was interviewed about data science.

Article links:

What is data science? By Mike Loukides  O’Reilly Radar
Six months after “What is data science?”  by Mac Slocum O’Reilly Radar

Hadoop Predicted To Be More Disruptive than Linux

Abhishek Mehta is Managing Director for Big Data and Analytics for Bank of America, and serves as Executive in Residence at MIT Media Lab.  SiliconAngle.tv founder John Furrier and Wikibon co-founder David Vellante interviewed him at Hadoop World last month.  Abhishek sees Hadoop as being more disruptive than Linux, and leading to the formation of data factories.  He also sees Hadoop giving programmers greater freedom to concentrate on creating algorithms that solve much larger, more complex problems than is possible today.

Here is a quote from the interview:
“So these data factories are going to emerge as the new drivers of innovation of a massive revolution that will change fundamentally how business models extract value, because data is going to be, is the core asset in a multitude of industries.”

At just under 30 minutes, this is a fascinating look into the future of Hadoop.

Transcript of the interview by Bert Latamore

Source attribution: SiliconAngle.tv video

Taking A Quick Tour of Apache Hadoop and HBase

Cloudera’s Todd Lipcon has put together an excellent overview of Hadoop and HBase and it is provided below.   Apache Hadoop is a scalable software framework capable of supporting highly data intensive applications.

The last twelve months has seen a steady increase in interest in Hadoop and HBase due to its implications for all variations of cloud-based applications, services, and testing.  Google Trends shows the following results on Hadoop for example.  Please click on the Trends graphic to see a larger image.

The Google File System (GFS) and Google MapReduce concepts serve as the theoretical foundation of Hadoop and HBase.  Doug Cutting created Hadoop while working at Yahoo.  The 45 slides in the following presentation provide an excellent overview of Hadoop and HBase and show its implications on cloud application development and services.

Follow

Get every new post delivered to your Inbox.

Join 14,843 other followers

%d bloggers like this: