Every week, Analytics India Magazine reaches out to developers, practitioners and experts from the machine learning community to gain insights into their journey in data science, and the tools and skills essential for their day-to-day operations.
For this week’s column, Analytics India Magazine got in touch with Dipanjan Sarkar, a very well known face in the machine learning community. In this story, we take you through the journey of Dipanjan and how he became an ML expert.
How It All Began
Dipanjan currently works as a Data Science Lead at Applied Materials where he leads a team of data scientists to solve various problems in the manufacturing and semiconductor domain by leveraging machine learning, deep learning, computer vision and natural language processing. He provides the much needed technical expertise, AI strategy, solutioning, and architecture, and works with stakeholders globally.
He has a bachelor’s degree in computer science & engineering and a masters in data science from IIIT Bangalore. Currently, he is pursuing a PG Diploma in ML and AI from Columbia University and an executive education certification course in AI Strategy from Northwestern University – Kellogg School of Management.
Apart from academia, Dipanjan is a big fan of MOOCs. He also beta-test new courses for Coursera before they are made public.
Dipanjan is also a Google Developer Expert in Machine Learning and has worked with several Fortune 500 companies. For an expert in ML, mathematics is a prerequisite, but we were surprised when we learnt that Dipanjan actually hated mathematics at school and this continued until ninth grade where he picked up statistics, linear algebra and calculus, the three pillars of machine learning.
I always loved the way you could program a computer to do specific tasks and make a machine actually learn with data!
Dipanjan’s renewed interest in mathematics was followed by his fascination for computer programming. With his growing fascination from mathematics to statistics and traditional computer programming, his career choice became almost obvious.
On Becoming An ML Expert
Reminiscing about his initial days, when the word ‘data science’ wasn’t worshipped yet, Dipanjan spoke about how the field was more conceptual and theoretical. Back then, there weren’t any active ecosystems of tools, languages and frameworks dedicated for data science. Hence, it took more time to learn theoretical concepts since it took more efforts to actually implement them or see them in practice.
With the advent of Python, R and a whole suite of tools and libraries, he believes that it has become easier to tame the learning curve of data science. However, he also warns that this can be a double-edged sword if one focuses on hands-on without deep-diving into the math and concepts behind algorithms and techniques to understand how it works or why it is used.
I have always been a strong advocate of self-learning, and I believe that is where you get maximum value
Due to the lack of mentors or proper guides, which are plenty nowadays on LinkedIn and other forums, Dipanjan had no other option than to self-learn with the help of the web and books.
For aspirants, he recommends the following books: –
- Deep Learning by Aaron Courville, Ian Goodfellow, and Yoshua Bengio
- Pattern Recognition by Christopher Bishop
- Introduction to Statistical Learning
- Elements of Statistical Learning by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie
To dive deep into the concepts and to get hands-on, he recommends Deep Learning with Keras, Python Machine Learning and Hands-On Machine Learning as practical books with examples. Dipanjan has also written a handful of books on practical machine learning.
When it comes to practice and deploying ML models, Dipanjan extensively uses the CRISP-DM (cross industry process for data mining) framework, which he considers to be one of the best frameworks to tackle any data science problem.
Also, before diving into models or data, he insists on the importance of identifying and articulating the business problem in the right manner. For conceptualising an AI use-case, Dipanjan recommends something called AI Canvas, which he has learnt from the Kellogg School of Management:
- Business Problem and Value
- Key Objective Function
- Data Strategy
- Modelling Approach
- Model Training Strategy
- Customer Value
Use the right tools for the job without waging wars of Python vs R or PyTorch vs TensorFlow
When asked about his favourite tools, Dipanjan explained the importance of not paying attention towards Python vs R or PyTorch vs TensorFlow and using the right tools that get the job done.
For instance, he and his team use the ecosystem of tools and libraries centered around Python very frequently. This includes the regular run-of-the-mill pandas, matplotlib, seaborn, plotly for data wrangling and exploratory data analysis. For statistical modelling he prefers libraries like scikit-learn, statsmodels and pyod.
Dipanjan’s toolkit looks as follows:
- Statistical Modeling: scikit-learn, statsmodels and pyod
- Deep Learning Frameworks: both TensorFlow (tf.keras) and PyTorch depending on the problem at hand
- Computer Vision: OpenCV, Matlab
- NLP: scikit-learn, spacy, gensim and transformers
- Transfer learning: pre-trained models from TensorFlow Hub
- Building baselines: AutoML frameworks
- Explainable AI: LIME and SHAP.
- Languages: R and Java in the past for both data analysis as well as to build pipelines and web interfaces besides Python.
Along with picking the right tools, he recommends practitioners to always go with the simplest solution unless complexity is adding substantial value and last but not the least, he urges people not to ignore documentation.
To those looking to break into the world of data science, Dipanjan suggests one to follow a hybrid approach, i.e. learn concepts, code and apply them on real-world datasets.
First, learn all the math and concepts and then try to actually apply the methods you have learnt
In the long, tedious process of learning, Dipanjan warns that people might lose focus and get sidetracked into thinking why are they even learning a certain method. To remedy this, he insists on learning and applying if one aims of becoming a good data scientist without deviating from the goal.
On ML Hype And Its Future
Addressing the overwhelming hype around AI and ML, Dipanjan says that he is already witnessing the dust settling down and how companies are now actually starting to realise both the limitations and value of AI. Deep learning and deep transfer learning are actually starting to provide value for companies working on complex problems involving unstructured data like images, audio, video and text and things are only going to get bigger and better with advanced tools and hardware in future. However, he admits that there is definitely still a fair bit of hype out there.
No matter how advanced the field gets, he believes that traditional machine learning models like linear and logistic regression will never go out of fashion since they are the bread and butter of various organisations and use-cases out there. And, models that are easy to explain, including linear models and decision trees will continue to be used extensively.
Going forward, he is optimistic about the use-cases and applications to optimise manufacturing, predicting demand and sales, inventory planning, logistics and routing, infrastructure management optimisation and enhancing customer support and experience, will continue to be the key drivers for almost all major organisations for the next decade.
When it comes to breakthroughs, Dipanjan expects something big to happen in newer domains like self-learning, continuous-learning, meta-learning and reinforcement learning.
Always remember to challenge other’s opinions with a healthy mindset because a good data scientist doesn’t just follow instructions blindly.
Talking about his tireless efforts to guide youngsters, he recollects how not having a mentor had been a major hindrance and how he had to unlearn and relearn overtime to correct his misconceptions. To help aspirants avoid the same mistakes, he mentors them whenever possible.
On a concluding note, Dipanjan said that he is mightily impressed by the relentless efforts of the data science community to share ideas through blogs, vlogs and online forums. Confessing his love for Analytics India Magazine, Dipanjan spoke about how AIM has been fostering a rich analytics ecosystem in India by reaching out to the global community.