The most common request from this blogs’ readers is how to further their careers in analytics, cloud computing, data science, and machine learning. I’ve invited Alyssa Columbus, a Data Scientist at Pacific Life, to share her insights and lessons learned on breaking into the field of data science and launching a career there. The following guest post is authored by her.
Earning a job in data science, especially your first job in data science, isn’t easy, especially given the surplus of analytics job-seekers to analytics jobs.
Many people are looking to break into data science, from undergraduates to career changers, have asked me how I’ve attained my current data science position at Pacific Life. I’ve referred them to many different resources, including discussions I’ve had on the Dataquest.io blog and the Scatter Podcast. In the interest of providing job seekers with a comprehensive view of what I’ve learned that works, I’ve put together the five most valuable lessons learned. I’ve written this article to make your data science job hunt easier and as efficient as possible.
- Continuously build your statistical literacy and programming skills. Currently, there are 24,697 open Data Scientist positions on LinkedIn in the United States alone. Using data mining techniques to analyze all open positions in the U.S., the following list of the top 10 data science skills was created today. As of April 14, the top 3 most common skills requested in LinkedIn data scientist job postings are Python, R, and SQL, closely followed by Jupyter Notebooks, Unix Shell/Awk, AWS, and Tensorflow. The following graphic provides a prioritized list of the most in-demand data science skills mentioned in LinkedIn job postings today. Please click on the graphic to expand for easier viewing.
Hands-on training is the best way to develop and continually improve statistical and programming skills, especially with the languages and technologies LinkedIn’s job postings prioritize. Getting your hands dirty with a dataset is often much better than reading through abstract concepts and not applying what you’ve learned to real problems. Your applied experience is just as important as your academic experience, and taking statistics, and computer science classes help to translate theoretical concepts into practical results. The toughest thing to learn (and also to teach) about statistical analysis is the intuition for what the big questions to ask of your dataset are. Statistical literacy, or “how” to find the answers to your questions, come with education and practice. Strengthening your intellectual curiosity or insight into asking the right questions comes through experience.
- Continually be creating your own, unique portfolio of analytics and machine learning projects. Having a good portfolio is essential to be hired as a data scientist, especially if you don’t come from a quantitative background or have experience in data science before. Think of your portfolio as proof to potential employers that you are capable of excelling in the role of a data scientist with both the passion and skills to do the job. When building your data science portfolio, select and complete projects that qualify you for the data science jobs, you’re the most interested in. Use your portfolio to promote your strengths and innate abilities by sharing projects you’ve completed on your own. Some skills I’d recommend you highlight in your portfolio include:
- Your programming language of choice (e.g., Python, R, Julia, etc.).
- The ability to interact with databases (e.g., your ability to use SQL).
- Visualization of data (static or interactive).
- Storytelling with data. This is a critical skill. In essence, can someone with no background in whatever area your project is in look at your project and gain some new understandings from it?
- Deployment of an application or API. This can be done with small sample projects (e.g., a REST API for an ML model you trained or a nice Tableau or R Shiny dashboard).
Julia Silge and Amber Thomas both have excellent examples of portfolios that you can be inspired by. Julia’s portfolio is shown below.
- Get (or git!) yourself a website. If you want to stand out, along with a portfolio, create and continually build a strong online presence in the form of a website. Be sure to create and continually add to your GitHub and Kaggle profiles to showcase your passion and proficiency in data science. Making your website with GitHub Pages creates a profile for you at the same time, and best of all it’s free to do. A strong online presence will not only help you in applying for jobs, but organizations may also reach out to you with freelance projects, interviews, and other opportunities.
- Be confident in your skills and apply for any job you’re interested in, starting with opportunities available in your network. If you don’t meet all of a job’s requirements, apply anyway. You don’t have to know every skill (e.g., programming languages) on a job description, especially if there are more than ten listed. If you’re a great fit for the main requirements of the job’s description, you need to apply. A good general rule is that if you have at least half of the skills requested on a job posting, go for it. When you’re hunting for jobs, it may be tempting to look for work on company websites or tech-specific job boards. I’ve found, as have many others, that these are among the least helpful ways to find work. Instead, contact recruiters specializing in data science and build up your network to break into the field. I recommend looking for a data science job via the following sources, with the most time devoted to recruiters and your network:
- Friends, family, and colleagues
- Career fairs and recruiting events
- General job boards
- Company websites
- Tech job boards.
Alyssa Columbus is a Data Scientist at Pacific Life and member of the Spring 2018 class of NASA Datanauts. Previously, she was a computational statistics and machine learning researcher at the UC Irvine Department of Epidemiology and has built robust predictive models and applications for a diverse set of industries spanning retail to biologics. Alyssa holds a degree in Applied and Computational Mathematics from the University of California, Irvine and is a member of Phi Beta Kappa. She is a strong proponent of reproducible methods, open source technologies, and diversity in analytics and is the founder of R-Ladies Irvine. You can reach her at her website: alyssacolumbus.com.
Bottom line: Enterprises are attaining double-digit improvements in forecast error rates, demand planning productivity, cost reductions and on-time shipments using machine learning today, revolutionizing supply chain management in the process.
Machine learning algorithms and the models they’re based on excel at finding anomalies, patterns and predictive insights in large data sets. Many supply chain challenges are time, cost and resource constraint-based, making machine learning an ideal technology to solve them. From Amazon’s Kiva robotics relying on machine learning to improve accuracy, speed and scale to DHL relying on AI and machine learning to power their Predictive Network Management system that analyzes 58 different parameters of internal data to identify the top factors influencing shipment delays, machine learning is defining the next generation of supply chain management. Gartner predicts that by 2020, 95% of Supply Chain Planning (SCP) vendors will be relying on supervised and unsupervised machine learning in their solutions. Gartner is also predicting by 2023 intelligent algorithms, and AI techniques will be an embedded or augmented component across 25% of all supply chain technology solutions.
The ten ways that machine learning is revolutionizing supply chain management include:
- Machine learning-based algorithms are the foundation of the next generation of logistics technologies, with the most significant gains being made with advanced resource scheduling systems. Machine learning and AI-based techniques are the foundation of a broad spectrum of next-generation logistics and supply chain technologies now under development. The most significant gains are being made where machine learning can contribute to solving complex constraint, cost and delivery problems companies face today. McKinsey predicts machine learning’s most significant contributions will be in providing supply chain operators with more significant insights into how supply chain performance can be improved, anticipating anomalies in logistics costs and performance before they occur. Machine learning is also providing insights into where automation can deliver the most significant scale advantages. Source: McKinsey & Company, Automation in logistics: Big opportunity, bigger uncertainty, April 2019. By Ashutosh Dekhne, Greg Hastings, John Murnane, and Florian Neuhaus
- The wide variation in data sets generated from the Internet of Things (IoT) sensors, telematics, intelligent transport systems, and traffic data have the potential to deliver the most value to improving supply chains by using machine learning. Applying machine learning algorithms and techniques to improve supply chains starts with data sets that have the greatest variety and variability in them. The most challenging issues supply chains face are often found in optimizing logistics, so materials needed to complete a production run arrive on time. Source: KPMG, Supply Chain Big Data Series Part 1
- Machine learning shows the potential to reduce logistics costs by finding patterns in track-and-trace data captured using IoT-enabled sensors, contributing to $6M in annual savings. BCG recently looked at how a decentralized supply chain using track-and-trace applications could improve performance and reduce costs. They found that in a 30-node configuration when blockchain is used to share data in real-time across a supplier network, combined with better analytics insight, cost savings of $6M a year is achievable. Source: Boston Consulting Group, Pairing Blockchain with IoT to Cut Supply Chain Costs, December 18, 2018, by Zia Yusuf, Akash Bhatia, Usama Gill, Maciej Kranz, Michelle Fleury, and Anoop Nannra
- Reducing forecast errors up to 50% is achievable using machine learning-based techniques. Lost sales due to products not being available are being reduced up to 65% through the use of machine learning-based planning and optimization techniques. Inventory reductions of 20 to 50% are also being achieved today when machine learning-based supply chain management systems are used. Source: Digital/McKinsey, Smartening up with Artificial Intelligence (AI) – What’s in it for Germany and its Industrial Sector? (PDF, 52 pp., no opt-in).
- DHL Research is finding that machine learning enables logistics and supply chain operations to optimize capacity utilization, improve customer experience, reduce risk, and create new business models. DHL’s research team continually tracks and evaluates the impact of emerging technologies on logistics and supply chain performance. They’re also predicting that AI will enable back-office automation, predictive operations, intelligent logistics assets, and new customer experience models. Source: DHL Trend Research, Logistics Trend Radar, Version 2018/2019 (PDF, 55 pp., no opt-in)
- Detecting and acting on inconsistent supplier quality levels and deliveries using machine learning-based applications is an area manufacturers are investing in today. Based on conversations with North American-based mid-tier manufacturers, the second most significant growth barrier they’re facing today is suppliers’ lack of consistent quality and delivery performance. The greatest growth barrier is the lack of skilled labor available. Using machine learning and advanced analytics manufacturers can discover quickly who their best and worst suppliers are, and which production centers are most accurate in catching errors. Manufacturers are using dashboards much like the one below for applying machine learning to supplier quality, delivery and consistency challenges. Source: Microsoft, Supplier Quality Analysis sample for Power BI: Take a tour, 2018
- Reducing risk and the potential for fraud, while improving the product and process quality based on insights gained from machine learning is forcing inspection’s inflection point across supply chains today. When inspections are automated using mobile technologies and results are uploaded in real-time to a secure cloud-based platform, machine learning algorithms can deliver insights that immediately reduce risks and the potential for fraud. Inspectorio is a machine learning startup to watch in this area. They’re tackling the many problems that a lack of inspection and supply chain visibility creates, focusing on how they can solve them immediately for brands and retailers. The graphic below explains their platform. Source: Forbes, How Machine Learning Improves Manufacturing Inspections, Product Quality & Supply Chain Visibility, January 23, 2019
- Machine learning is making rapid gains in end-to-end supply chain visibility possible, providing predictive and prescriptive insights that are helping companies react faster than before. Combining multi-enterprise commerce networks for global trade and supply chain management with AI and machine learning platforms are revolutionizing supply chain end-to-end visibility. One of the early leaders in this area is Infor’s Control Center. Control Center combines data from the Infor GT Nexus Commerce Network, acquired by the company in September 2015, with Infor’s Coleman Artificial Intelligence (AI) Infor chose to name their AI platform after the inspiring physicist and mathematician Katherine Coleman Johnson, whose trail-blazing work helped NASA land on the moon. Be sure to pick up a copy of the book and see the movie Hidden Figures if you haven’t already to appreciate her and many other brilliant women mathematicians’ many contributions to space exploration. ChainLink Research provides an overview of Control Center in their article, How Infor is Helping to Realize Human Potential, and two screens from Control Center are shown below.
- Machine learning is proving to be foundational for thwarting privileged credential abuse which is the leading cause of security breaches across global supply chains. By taking a least privilege access approach, organizations can minimize attack surfaces, improve audit and compliance visibility, and reduce risk, complexity, and the costs of operating a modern, hybrid enterprise. CIOs are solving the paradox of privileged credential abuse in their supply chains by knowing that even if a privileged user has entered the right credentials but the request comes in with risky context, then stronger verification is needed to permit access. Zero Trust Privilege is emerging as a proven framework for thwarting privileged credential abuse by verifying who is requesting access, the context of the request, and the risk of the access environment. Centrify is a leader in this area, with globally-recognized suppliers including Cisco, Intel, Microsoft, and Salesforce being current customers. Source: Forbes, High-Tech’s Greatest Challenge Will Be Securing Supply Chains In 2019, November 28, 2018.
- Capitalizing on machine learning to predict preventative maintenance for freight and logistics machinery based on IoT data is improving asset utilization and reducing operating costs. McKinsey found that predictive maintenance enhanced by machine learning allows for better prediction and avoidance of machine failure by combining data from the advanced Internet of Things (IoT) sensors and maintenance logs as well as external sources. Asset productivity increases of up to 20% are possible and overall maintenance costs may be reduced by up to 10%. Source: Digital/McKinsey, Smartening up with Artificial Intelligence (AI) – What’s in it for Germany and its Industrial Sector? (PDF, 52 pp., no opt-in).
Accenture, Reinventing The Supply Chain With AI, 20 pp., PDF, no opt-in.
Bendoly, E. (2016). Fit, Bias, and Enacted Sensemaking in Data Visualization: Frameworks for Continuous Development in Operations and Supply Chain Management Analytics. Journal Of Business Logistics, 37(1), 6-17.
Boston Consulting Group, Pairing Blockchain with IoT to Cut Supply Chain Costs, December 18, 2018, by Zia Yusuf, Akash Bhatia, Usama Gill, Maciej Kranz, Michelle Fleury, and Anoop Nannra
- There are 29,187 Software Engineering jobs available today, making this job the most popular regarding Glassdoor postings according to the study.
These and many other fascinating insights are from Glassdoor’s 50 Best Jobs In America For 2018. The Glassdoor Report is viewable online here. Glassdoor’s annual report highlights the 50 best jobs based on each job’s overall Glassdoor Job Score.The Glassdoor Job Score is determined by weighing three key factors equally: earning potential based on median annual base salary, job satisfaction rating, and the number of job openings. Glassdoor’s 2018 report lists jobs that excel across all three dimensions of their Job Score metric. For an excellent overview of the study by Karsten Strauss of Forbes, please see his post, The Best Jobs To Apply For In 2018.
LinkedIn’s 2017 U.S. Emerging Jobs Report found that there are 9.8 times more Machine Learning Engineers working today than five years ago with 1,829 open positions listed on their site as of last month. Data science and machine learning are generating more jobs than candidates right now, making these two areas the fastest growing tech employment areas today.
Key takeaways from the study include the following:
- Six analytics and data science jobs are included in Glassdoor’s 50 best jobs In America for 2018. These include Data Scientist, Analytics Manager, Database Administrator, Data Engineer, Data Analyst and Business Intelligence Developer. The complete list of the top 50 jobs is provided below with the analytics and data science jobs highlighted along with software engineering, which has a record 29,817 open jobs today:
- Median base salary of the 50 best jobs in America is $91,000 with the average salary of the six analytics and data science jobs being $94,167.
- Tech jobs make up 20 of Glassdoor’s 50 Best Jobs in America for 2018, up from 14 jobs in 2017.
Source: Glassdoor Reveals the 50 Best Jobs in America for 2018
- Data scientist roles have grown over 650% since 2012, but currently, 35,000 people in the US have data science skills, while hundreds of companies are hiring for those roles.
- Job growth in the next decade is expected to outstrip growth during the previous decade, creating 11.5M jobs by 2026, according to the U.S. Bureau of Labor Statistics.
These and many other insights are from the recently released LinkedIn 2017 U.S. Emerging Jobs Report. LinkedIn has provided an overview of the methodology in their post, The Fastest-Growing Jobs in the U.S. Based on LinkedIn Data. “Emerging jobs” refers to the job titles that saw the largest growth in frequency over that five year period. LinkedIn reports that based on their analysis, the job market in the U.S. is brimming right now with fresh and exciting opportunities for professionals in a range of emerging roles.
Key takeaways from the study include the following:
- There are 9.8 times more Machine Learning Engineers working today than five years ago based on LinkedIn’s research, with 1,829 open positions listed on the site today. There are 6.5 times more Data Scientists than five years ago, and 5.5 times more Big Data Developers. The following graphic illustrates the rapid growth of key data scient, machine leanring, big data and full stack developers in addition to sales development and customer success managers.
- Software engineering is a common starting point for professionals who are in the top five fasting growing jobs today. The career path to Machine Learning Engineer and Big Data Developer begins with a solid software engineering background. The top five highest growth job typical career paths are shown below:
- The skills most strongly represented across the 20 fastest growing jobs include management, sales, communication, and marketing. Additional skills represented across the highest growing jobs include marketing expertise (analytics and marketing automation), start-ups, Python, software development, analytics, cloud computing and knowledge of retail systems.
- LinkedIn interviewed 1,200 hiring managers to determine which soft skills are most in-demand and adaptability came out on top. Additional soft skills include culture fit, collaboration, leadership, growth potential, and prioritization.
LinkedIn Blog: The Fastest-Growing Jobs in the U.S. Based on LinkedIn Data
LinkedIn’s 2017 U.S. Emerging Jobs Report