Data Science Shows Potential To Redefine Cloud-based Analytics
[tweetmeme source=@LouisColumbus only_single=false]
The emerging field of data science is a fascinating one that has major implications on the potential of cloud-based analytics, CRM, search, supply chain management and logistics.
Instead of relying purely on latent semantic indexing or the Google PageRank algorithm to define relevance of a search, data science techniques analyze content and its context to determine relevance. Google today looks at the content of a page; data science considers its surrounding data and relevance.
Earlier this month TechCrunch published the blog post Marissa Mayer’s Next Big Thing: “Contextual Discovery” — Google Results Without Search. The techniques of contextual discovery Google is experimenting with rely on a very rapid aggregation and transforming of data, which are part of the methodologies of data science. When Google moves fully into contextual discovery the potential exists for cloud-based analytics, CRM, search, supply chain management and logistics to be completely revolutionized by solving the big data problems associated with each of these areas.
In CRM, this would mean finally being able to access external and internal content (including the massive amount of data on social networks), aggregate the data, and transform it into meaningful analysis. The vision of social CRM would be realized once data science serves as the catalyst of contextual search or as Google calls it, contextual discovery.
Exploring Data Science
Two of the best blog posts are both from O’Reilly Radar on the emerging topic of data science. What is data science? By Mike Loukides and Six months after “What is data science?” by Mac Slocum O’Reilly Radar are worth reading and giving some serious thought to. O’Reilly also has also created a free report titled What is Data Science, which can be downloaded here.
Authors Mike Loukides and Mac Slocum set the foundation for how transformational data science has the potential of being by concentrating on the nascent area of data products. A data product is the result of accessing, aggregating and transforming content regardless of its location – and capturing data on its attributes – not just the data itself. Both authors point to reference systems and guided reference engines on e-commerce sites as just the beginning. Yet after reading their assessments and listening to Roger Magoulas, O’Reilly’s Director of Research, interviewed about data science below there are many more potential uses of this evolving area.
Potential Impact of Data Science on Analytics
The blog posts by Mike Loukides and Mac Slocum go into detail explaining how each area of data science is in varying levels of maturity. After reading these over and considering the big data problems in cloud-based analytics, CRM, search, supply chain management and logistics, the following methodology starts to make sense:
Access – For data science to realize its full potential there needs to be a technology layer that provides for real-time access to structured and unstructured content both within and outside an enterprise. More than a traditional Enterprise Application Integration (EAI) layer the technologies driving data access need to selectively pull all available content from every unstructured and structured data source available. Mike Loukides mentions Google Goggles and how MapReduce has made this application possible. Hadoop as a means to create greater access across federated content has much potential in this phase as well.
Aggregate – Called data conditioning by Mike Loukides, the aggregation phase is where contextual discovery happens. This could be accomplished through contextual search filters, taxonomies defined by specific alerts, or the use of the MapReduce and Hadoop query and relevance tools in use today.
Transform – Where Hadoop could be used for driving data analysis and as Mike Loukides calls this level of analysis, data jiujitsu. Examples are mentioned by both Mike Loukides and Mac Slocum including the Hadoop Online Prototype (HOP), which does real-time stream processing and several others. The impact of the access, aggregate and transform methodology on visualization is available at Flowing Data, one of the best sites on the Web for seeing how MapReduce, Hadoop and other data science-related techniques are taking on massive amounts of data and delivering insights.
Solving the big data problems of social media monitoring, sentiment analysis, forming a scalable platform for social CRM, integrating CRM, supply chain management and logistics data to demand management – and tying all of these areas to financial performance – is potentially achievable with data science. Deployed as a cloud-based platform opens up even greater potential for getting the most use of social networks, free data sources, and third-party databases than is possible today.
Be sure to check out the video below of Roger Magoulas, O’Reilly’s Director of Research, where he was interviewed about data science.