59% of all large enterprises are deploying data science (DS) and machine learning (ML) today.
Nearly 50% of all organizations have up to 25 or more ML models in use today.
29% of enterprises are refreshing their data science and machine learning models every day.
The higher the data literacy an enterprise can achieve before launching Data Science & Machine Learning initiatives, the higher the probability of success.
These and many other insights defining the state of the data science and machine learning market in 2021 are from Dresner Advisory Services’2021 Data Science and Machine Learning Market Study. The 7th annual report is noteworthy for its depth of analysis and insight into how data science and machine learning adoption is growing stronger in enterprises. In addition, the study explains which factors drive adoption and determine the key success factors that matter the most when deploying data science and machine learning techniques. The methodology uses crowdsourcing techniques to recruit respondents from over 6,000 organizations and vendors’ customer communities. As a result, 52% of respondents are from North America and 34% from EMEA, with the balance from Asia-Pacific and Latin America.
“The perceived importance of data science and machine learning correlates with organizational success with BI, with users that self-report as completely successful with BI almost twice as likely to rate data science as critical,” said Jim Ericson, vice president, and research director at Dresner Advisory. “The perceived level of data literacy also correlates directly and positively with the current or likely future use of data science and machine learning in 2021.”
Key insights from the study include the following:
59% of large enterprises are deploying data science and machine learning in production today. Enterprises with 10K employees or more lead all others in adopting and using DS and ML techniques, most often in R&D and Business Intelligence Competency Center (BICC)-related work. Large-scale enterprises often rely on DS and ML to identify how internal processes and workflows can be streamlined and made more cost-efficient. For example, the CEO of a manufacturing company explained on a recent conference call that DS and ML pilots bring much-needed visibility and control across multiple plants and help troubleshoot inventory management and supply chain allocation problems.
The importance of data science and ML to enterprises has doubled in eight years, jumping from 25% in 2014 to 70% in 2021. The Dresner study notes that a record level of enterprises sees data science and ML as critically important to their business in 2021. Furthermore, 90% of enterprises consider these technologies essential to their operations, rating them critically important or very important. Successful projects in Business Intelligence Competency Centers (BICC) and R&D helped data science and ML gain broad adoption across all organizations. Larger-scale enterprises with over 10K employees are successfully scaling data science and ML to improve visibility, control, and profitability in organizations today.
Enterprises dominate the recruiting and retention of data science and machine learning talent. Large-scale enterprises with over 10K employees are the most likely to have BI experts and data scientists/statisticians on staff. In addition, large-scale enterprises lead hiring and retention in seven of the nine roles included in the survey. It’s understandable how the Business Intelligence (BI) expertise of professionals in these roles is helping remove the roadblocks to getting more business value from data science and machine learning. Enterprises are learning how to scale data science and ML models to take on problems that were too complex to solve with analytics or BI alone.
80% of DS and ML respondents most want model lifecycle management, model performance monitoring, model version control, and model lineage and history at a minimum. Keeping track of the state of each model, including version control, is a challenge for nearly all organizations adopting ML today. Enterprises reach ML scale when they can manage ML models across their lifecycles using an automated system. The next four most popular features of model rollback, searchable model repository, collaborative, model co-creation tools, and model registration and certification are consistent with the feedback from Data Science teams on what they need most in an ML platform.
Financial Services prioritize model lifecycle management and model performance monitoring to achieve greater scale from the tens of thousands of models they’re using today. Consistent with other research that tracks ML adoption by industry, the Dresner study found that Financial Services leads all other industries in their need for the two most valuable features of ML platforms, model lifecycle management and model performance monitoring. Retail and Wholesale are reinventing their business models in real-time to become more virtual while also providing greater real-time visibility across supply chains. ML models in these two industries need automated model version control, model lineage and history, model rollback, collaborative, model co-creation tools, and model registration and certification. In addition, retailers and Wholesalers are doubling down on data science and machine learning to support new digital businesses, improve supply chain performance and increase productivity.
Enterprises need support for their expanding range of regression models, text analytics functions, and ensemble learning. Over the last seven years, text analytics functions and sentiment analysis’ popularity has continually grown. Martech vendors and the marketing technologists driving the market are increasing sentiment analysis’ practicality and importance. Recommendation engines and geospatial analysis are also experiencing greater adoption due to martech changing the nature of customer- and market-driven analysis and predictive modeling.
R, TensorFlow, and PyTorch are considered the three most critical open-source statistical and machine learning frameworks in 2021. Nearly 70% of respondents consider R important to getting work done in data science and ML. The R language has established itself as an industry standard and is well-respected across DevOps, and IT teams in financial services, professional services, consulting, process, and discrete manufacturing. Tensorflow and Pytorch are considered important by the majority of organizations Dresner’s research team interviewed. They’re also among the most in-demand ML frameworks today, with new applicants having experience in all three being recruited actively today.
Data literacy predicts DS and ML program success rates. 64% of organizations say they have extremely high literacy rates, implying that DS and ML have reached mainstream adoption thanks partly to BI literacy rates in the past. Enterprises that prioritize data literacy by providing training, certification, and ongoing education increase success odds with ML. A bonus is that employees will have a chance to learn marketable skills they can use in their current and future positions. Investing in training to improve data literacy is a win/win.
On-database analytics and in-memory analytics (both 91%), and multi-tenant cloud services (88%) are the three most popular technologies enterprises rely on for greater scalability. Dresner’s research team observes that the scalability of data science and machine learning often involves multiple, different requirements to address high data volumes, large numbers of users, data variety while supporting analytic throughput. Apache Spark support continues to grow in enterprises and is the fourth-most relied-on industry support for ML scalability.
Cybersecurity professionals with cloud security skills can gain a $15,025 salary premium by capitalizing on strong market demand for their skills in 2021.
DevOps and Application Development Security professionals can expect to earn a $12,266 salary premium based on their unique, in-demand skills.
413,687 job postings for Health Information Security professionals were posted between October 2019 to September 2020, leading all skill areas in demand.
Cybersecurity’s fastest-growing skill areas reflect the high priority organizations place on building secure digital infrastructures that can scale. Application Development Security and Cloud Security are far and away from the fastest-growing skill areas in cybersecurity, with projected 5-year growth of 164% and 115%, respectively. This underscores the shift from retroactive security strategies to proactive security strategies. According to The U.S. Bureau of Labor Statistics’ Information Security Analyst’s Outlook, cybersecurity jobs are among the fastest-growing career areas nationally. The BLS predicts cybersecurity jobs will grow 31% through 2029, over seven times faster than the national average job growth of 4%.
Key takeaways from their analysis include the following:
Cloud Security skills are the most lucrative of all, predicted to deliver a $15,008 salary boost in 2021. Demand for specific Cloud Security skills is far outpacing the broader demand for cybersecurity skills in the labor market. Burning Glass predicts the fastest-growing skills over the next five years include Azure Security (+164%), Cloud Security Infrastructure (+144%), Google Cloud Security (+135%), Public Cloud Security (+121%), Cloud Security Architecture (+103%). There are 19,477 positions available for cybersecurity professionals with Cloud Security skills.
The fastest-growing cybersecurity skill is Application Development Security, predicted to see a 164% increase in available positions over five years. Cybersecurity professionals with Application Development Security, DevSecOps, Container Security, Microservices Security, Application Security Code Review are predicted to see an average $12,266 salary boost starting next year given the strong marketability of their skills. Like Cloud Security, market demand for Application Development Security professionals’ skillsets far outpaces average cybersecirty jobs growth over five years.
Knowing where the most cybersecurity job postings are by metro area and state provides job seekers with the insights they need to narrow their job search. Cyberseek partnered with Burning Glass to create an interactive U.S.-based heat map that shows cybersecurity positions by state or metro area. The heat map can be configured to show total job openings, supply of workers, supply/demand ratio,and location quotients. You can access the heat map here.
The most common request from this blogs’ readers is how to further their careers in analytics, cloud computing, data science, and machine learning. I’ve invited Alyssa Columbus, a Data Scientist at Pacific Life, to share her insights and lessons learned on breaking into the field of data science and launching a career there. The following guest post is authored by her.
Many people are looking to break into data science, from undergraduates to career changers, have asked me how I’ve attained my current data science position at Pacific Life. I’ve referred them to many different resources, including discussions I’ve had on the Dataquest.ioblog and the Scatter Podcast. In the interest of providing job seekers with a comprehensive view of what I’ve learned that works, I’ve put together the five most valuable lessons learned. I’ve written this article to make your data science job hunt easier and as efficient as possible.
Continuously build your statistical literacy and programming skills. Currently, there are 24,697 open Data Scientist positions on LinkedIn in the United States alone. Using data mining techniques to analyze all open positions in the U.S., the following list of the top 10 data science skills was created today. As of April 14, the top 3 most common skills requested in LinkedIn data scientist job postings are Python, R, and SQL, closely followed by Jupyter Notebooks, Unix Shell/Awk, AWS, and Tensorflow. The following graphic provides a prioritized list of the most in-demand data science skills mentioned in LinkedIn job postings today. Please click on the graphic to expand for easier viewing.
Hands-on training is the best way to develop and continually improve statistical and programming skills, especially with the languages and technologies LinkedIn’s job postings prioritize. Getting your hands dirty with a dataset is often much better than reading through abstract concepts and not applying what you’ve learned to real problems. Your applied experience is just as important as your academic experience, and taking statistics, and computer science classes help to translate theoretical concepts into practical results. The toughest thing to learn (and also to teach) about statistical analysis is the intuition for what the big questions to ask of your dataset are. Statistical literacy, or “how” to find the answers to your questions, come with education and practice. Strengthening your intellectual curiosity or insight into asking the right questions comes through experience.
Continually be creating your own, unique portfolio of analytics and machine learning projects. Having a good portfolio is essential to be hired as a data scientist, especially if you don’t come from a quantitative background or have experience in data science before. Think of your portfolio as proof to potential employers that you are capable of excelling in the role of a data scientist with both the passion and skills to do the job. When building your data science portfolio, select and complete projects that qualify you for the data science jobs, you’re the most interested in. Use your portfolio to promote your strengths and innate abilities by sharing projects you’ve completed on your own. Some skills I’d recommend you highlight in your portfolio include:
Your programming language of choice (e.g., Python, R, Julia, etc.).
The ability to interact with databases (e.g., your ability to use SQL).
Visualization of data (static or interactive).
Storytelling with data. This is a critical skill. In essence, can someone with no background in whatever area your project is in look at your project and gain some new understandings from it?
Deployment of an application or API. This can be done with small sample projects (e.g., a REST API for an ML model you trained or a nice Tableau or R Shiny dashboard).
Julia Silge and Amber Thomas both have excellent examples of portfolios that you can be inspired by. Julia’s portfolio is shown below.
Get (or git!) yourself a website. If you want to stand out, along with a portfolio, create and continually build a strong online presence in the form of a website. Be sure to create and continually add to your GitHub and Kaggle profiles to showcase your passion and proficiency in data science. Making your website with GitHub Pages creates a profile for you at the same time, and best of all it’s free to do. A strong online presence will not only help you in applying for jobs, but organizations may also reach out to you with freelance projects, interviews, and other opportunities.
Be confident in your skills and apply for any job you’re interested in, starting with opportunities available in your network. If you don’t meet all of a job’s requirements, apply anyway. You don’t have to know every skill (e.g., programming languages) on a job description, especially if there are more than ten listed. If you’re a great fit for the main requirements of the job’s description, you need to apply. A good general rule is that if you have at least half of the skills requested on a job posting, go for it. When you’re hunting for jobs, it may be tempting to look for work on company websites or tech-specific job boards. I’ve found, as have many others, that these are among the least helpful ways to find work. Instead, contact recruiters specializing in data science and build up your network to break into the field. I recommend looking for a data science job via the following sources, with the most time devoted to recruiters and your network:
Alyssa Columbus is a Data Scientist at Pacific Life and member of the Spring 2018 class of NASA Datanauts. Previously, she was a computational statistics and machine learning researcher at the UC Irvine Department of Epidemiology and has built robust predictive models and applications for a diverse set of industries spanning retail to biologics. Alyssa holds a degree in Applied and Computational Mathematics from the University of California, Irvine and is a member of Phi Beta Kappa. She is a strong proponent of reproducible methods, open source technologies, and diversity in analytics and is the founder of R-Ladies Irvine. You can reach her at her website: alyssacolumbus.com.
Bottom line: Enterprises are attaining double-digit improvements in forecast error rates, demand planning productivity, cost reductions and on-time shipments using machine learning today, revolutionizing supply chain management in the process.
The ten ways that machine learning is revolutionizing supply chain management include:
Machine learning-based algorithms are the foundation of the next generation of logistics technologies, with the most significant gains being made with advanced resource scheduling systems. Machine learning and AI-based techniques are the foundation of a broad spectrum of next-generation logistics and supply chain technologies now under development. The most significant gains are being made where machine learning can contribute to solving complex constraint, cost and delivery problems companies face today. McKinsey predicts machine learning’s most significant contributions will be in providing supply chain operators with more significant insights into how supply chain performance can be improved, anticipating anomalies in logistics costs and performance before they occur. Machine learning is also providing insights into where automation can deliver the most significant scale advantages. Source: McKinsey & Company, Automation in logistics: Big opportunity, bigger uncertainty, April 2019. By Ashutosh Dekhne, Greg Hastings, John Murnane, and Florian Neuhaus
The wide variation in data sets generated from the Internet of Things (IoT) sensors, telematics, intelligent transport systems, and traffic data have the potential to deliver the most value to improving supply chains by using machine learning. Applying machine learning algorithms and techniques to improve supply chains starts with data sets that have the greatest variety and variability in them. The most challenging issues supply chains face are often found in optimizing logistics, so materials needed to complete a production run arrive on time. Source: KPMG, Supply Chain Big Data Series Part 1
Machine learning shows the potential to reduce logistics costs by finding patterns in track-and-trace data captured using IoT-enabled sensors, contributing to $6M in annual savings. BCG recently looked at how a decentralized supply chain using track-and-trace applications could improve performance and reduce costs. They found that in a 30-node configuration when blockchain is used to share data in real-time across a supplier network, combined with better analytics insight, cost savings of $6M a year is achievable. Source: Boston Consulting Group, Pairing Blockchain with IoT to Cut Supply Chain Costs, December 18, 2018, by Zia Yusuf, Akash Bhatia, Usama Gill, Maciej Kranz, Michelle Fleury, and Anoop Nannra
Reducing forecast errors up to 50% is achievable using machine learning-based techniques. Lost sales due to products not being available are being reduced up to 65% through the use of machine learning-based planning and optimization techniques. Inventory reductions of 20 to 50% are also being achieved today when machine learning-based supply chain management systems are used. Source: Digital/McKinsey, Smartening up with Artificial Intelligence (AI) – What’s in it for Germany and its Industrial Sector? (PDF, 52 pp., no opt-in).
DHL Research is finding that machine learning enables logistics and supply chain operations to optimize capacity utilization, improve customer experience, reduce risk, and create new business models. DHL’s research team continually tracks and evaluates the impact of emerging technologies on logistics and supply chain performance. They’re also predicting that AI will enable back-office automation, predictive operations, intelligent logistics assets, and new customer experience models. Source: DHL Trend Research, Logistics Trend Radar, Version 2018/2019 (PDF, 55 pp., no opt-in)
Detecting and acting on inconsistent supplier quality levels and deliveries using machine learning-based applications is an area manufacturers are investing in today. Based on conversations with North American-based mid-tier manufacturers, the second most significant growth barrier they’re facing today is suppliers’ lack of consistent quality and delivery performance. The greatest growth barrier is the lack of skilled labor available. Using machine learning and advanced analytics manufacturers can discover quickly who their best and worst suppliers are, and which production centers are most accurate in catching errors. Manufacturers are using dashboards much like the one below for applying machine learning to supplier quality, delivery and consistency challenges. Source: Microsoft, Supplier Quality Analysis sample for Power BI: Take a tour, 2018
Reducing risk and the potential for fraud, while improving the product and process quality based on insights gained from machine learning is forcing inspection’s inflection point across supply chains today. When inspections are automated using mobile technologies and results are uploaded in real-time to a secure cloud-based platform, machine learning algorithms can deliver insights that immediately reduce risks and the potential for fraud. Inspectorio is a machine learning startup to watch in this area. They’re tackling the many problems that a lack of inspection and supply chain visibility creates, focusing on how they can solve them immediately for brands and retailers. The graphic below explains their platform. Source: Forbes, How Machine Learning Improves Manufacturing Inspections, Product Quality & Supply Chain Visibility, January 23, 2019
Machine learning is making rapid gains in end-to-end supply chain visibility possible, providing predictive and prescriptive insights that are helping companies react faster than before. Combining multi-enterprise commerce networks for global trade and supply chain management with AI and machine learning platforms are revolutionizing supply chain end-to-end visibility. One of the early leaders in this area is Infor’s Control Center. Control Center combines data from the Infor GT Nexus Commerce Network, acquired by the company in September 2015, with Infor’s Coleman Artificial Intelligence (AI) Infor chose to name their AI platform after the inspiring physicist and mathematician Katherine Coleman Johnson, whose trail-blazing work helped NASA land on the moon. Be sure to pick up a copy of the book and see the movie Hidden Figures if you haven’t already to appreciate her and many other brilliant women mathematicians’ many contributions to space exploration. ChainLink Research provides an overview of Control Center in their article, How Infor is Helping to Realize Human Potential, and two screens from Control Center are shown below.
Machine learning is proving to be foundational for thwarting privileged credential abuse which is the leading cause of security breaches across global supply chains. By taking a least privilege access approach, organizations can minimize attack surfaces, improve audit and compliance visibility, and reduce risk, complexity, and the costs of operating a modern, hybrid enterprise. CIOs are solving the paradox of privileged credential abuse in their supply chains by knowing that even if a privileged user has entered the right credentials but the request comes in with risky context, then stronger verification is needed to permit access. Zero Trust Privilege is emerging as a proven framework for thwarting privileged credential abuse by verifying who is requesting access, the context of the request, and the risk of the access environment. Centrify is a leader in this area, with globally-recognized suppliers including Cisco, Intel, Microsoft, and Salesforce being current customers. Source: Forbes, High-Tech’s Greatest Challenge Will Be Securing Supply Chains In 2019, November 28, 2018.
Capitalizing on machine learning to predict preventative maintenance for freight and logistics machinery based on IoT data is improving asset utilization and reducing operating costs. McKinsey found that predictive maintenance enhanced by machine learning allows for better prediction and avoidance of machine failure by combining data from the advanced Internet of Things (IoT) sensors and maintenance logs as well as external sources. Asset productivity increases of up to 20% are possible and overall maintenance costs may be reduced by up to 10%. Source: Digital/McKinsey, Smartening up with Artificial Intelligence (AI) – What’s in it for Germany and its Industrial Sector? (PDF, 52 pp., no opt-in).
Bendoly, E. (2016). Fit, Bias, and Enacted Sensemaking in Data Visualization: Frameworks for Continuous Development in Operations and Supply Chain Management Analytics. Journal Of Business Logistics, 37(1), 6-17.
Data Scientist has been named the best job in America for three years running, with a median base salary of $110,000 and 4,524 job openings.
DevOps Engineer is the second-best job in 2018, paying a median base salary of $105,000 and 3,369 job openings.
There are 29,187 Software Engineering jobs available today, making this job the most popular regarding Glassdoor postings according to the study.
These and many other fascinating insights are from Glassdoor’s 50 Best Jobs In America For 2018. The Glassdoor Report is viewable online here. Glassdoor’s annual report highlights the 50 best jobs based on each job’s overall Glassdoor Job Score.The Glassdoor Job Score is determined by weighing three key factors equally: earning potential based on median annual base salary, job satisfaction rating, and the number of job openings. Glassdoor’s 2018 report lists jobs that excel across all three dimensions of their Job Score metric. For an excellent overview of the study by Karsten Strauss of Forbes, please see his post, The Best Jobs To Apply For In 2018.
LinkedIn’s 2017 U.S. Emerging Jobs Report found that there are 9.8 times more Machine Learning Engineers working today than five years ago with 1,829 open positions listed on their site as of last month. Data science and machine learning are generating more jobs than candidates right now, making these two areas the fastest growing tech employment areas today.
Key takeaways from the study include the following:
Six analytics and data science jobs are included in Glassdoor’s 50 best jobs In America for 2018. These include Data Scientist, Analytics Manager, Database Administrator, Data Engineer, Data Analyst and Business Intelligence Developer. The complete list of the top 50 jobs is provided below with the analytics and data science jobs highlighted along with software engineering, which has a record 29,817 open jobs today:
Median base salary of the 50 best jobs in America is $91,000 with the average salary of the six analytics and data science jobs being $94,167.
Across all six analytics and data science jobs there are 16,702 openings as of today according to Glassdoor.
Tech jobs make up 20 of Glassdoor’s 50 Best Jobs in America for 2018, up from 14 jobs in 2017.
Machine Learning Engineers, Data Scientists, and Big Data Engineers rank among the top emerging jobs on LinkedIn.
Data scientist roles have grown over 650% since 2012, but currently, 35,000 people in the US have data science skills, while hundreds of companies are hiring for those roles.
There are currently 1,829 open Machine Learning Engineering positions on LinkedIn.
Job growth in the next decade is expected to outstrip growth during the previous decade, creating 11.5M jobs by 2026, according to the U.S. Bureau of Labor Statistics.
These and many other insights are from the recently released LinkedIn 2017 U.S. Emerging Jobs Report. LinkedIn has provided an overview of the methodology in their post, The Fastest-Growing Jobs in the U.S. Based on LinkedIn Data. “Emerging jobs” refers to the job titles that saw the largest growth in frequency over that five year period. LinkedIn reports that based on their analysis, the job market in the U.S. is brimming right now with fresh and exciting opportunities for professionals in a range of emerging roles.
Key takeaways from the study include the following:
There are 9.8 times more Machine Learning Engineers working today than five years ago based on LinkedIn’s research, with 1,829 open positions listed on the site today. There are 6.5 times more Data Scientists than five years ago, and 5.5 times more Big Data Developers. The following graphic illustrates the rapid growth of key data scient, machine leanring, big data and full stack developers in addition to sales development and customer success managers.
Software engineering is a common starting point for professionals who are in the top five fasting growing jobs today. The career path to Machine Learning Engineer and Big Data Developer begins with a solid software engineering background. The top five highest growth job typical career paths are shown below:
The skills most strongly represented across the 20 fastest growing jobs include management, sales, communication, and marketing. Additional skills represented across the highest growing jobs include marketing expertise (analytics and marketing automation), start-ups, Python, software development, analytics, cloud computing and knowledge of retail systems.
LinkedIn interviewed 1,200 hiring managers to determine which soft skills are most in-demand and adaptability came out on top. Additional soft skills include culture fit, collaboration, leadership, growth potential, and prioritization.