About
Harshal is a technology-driven individual with a Master's degree in Data Science, currently working with Blue Matter Consulting in their Insights & Analytics team.
Having a strong background in data science tools and technologies, he has demonstrated expertise in crafting end-to-end solutions that transform data into actionable insights to support business growth for life sciences companies. He strives to create an impactful career in the field of data science and establish long-term associations with organizations in need of data-backed business problem-solving.
In addition to his passion for data science, Harshal is also an enthusiastic public speaker, always seeking opportunities to share his knowledge and insights. Whether through presentations, hosting events, or participating in podcasts, he thrives on engaging with others and contributing to meaningful discussions.
Thank you for visiting! Feel free to explore the portfolio and reach out for work opportunities or inquiries. Together, let's harness the potential of data and make a difference!
Skills
- Programming Languages: Languages: Python, R, PySpark, C/C++, Java, JavaScript, PHP, HTML/CSS
- Database: MySQL, PostgreSQL, NoSQL, MongoDB, Google Firebase, Oracle SQL Server
- Analytics/Cloud: Tableau, Looker, Databricks, Spark, Alteryx, AWS (S3, Redshift, Athena), MLFlow, IBM Cognos, Denodo, Advanced MS Excel
- IDEs/Frameworks: Jupyter, Git, VSCode, RStudio, MS Office, Anaconda, Eclipse, Apache Spark
- Development: Angular, Webflow, WordPress, Android Studio
- Data Science:Machine Learning, Natural Language Processing, Generative Models, Time Series Forecasting, ETL, Exploratory Data Analysis, Hypothesis Testing, Statistical Analysis, A/B Testing, Quality Assurance and Control
Education
Master of Science - Data Science
Aug 2022 - Dec 2023
University of Rochester, USA
- Key Courses – NLP, Statistical Machine Learning, Time Series Analysis, Computational Statistics, Data Mining
- Recipient of a 30% Scholarship on the tuition fees
- Graduate Teaching Assistant for GBA465 - Python Analytics course
- Assisting accessibility and inclusivity by managing and proctoring exams for students at the Office of Disability Resources
Bachelor of Engineering - Computer Science
Jun 2016 - Aug 2020
Savitribai Phule Pune University, India
- Key Courses – Database Management Systems, Data Analytics, Data Structures, Distributed Systems
- Volunteered in organizing & participating in non-technical events like debates, treasure hunts, etc. and technical events like hackathons, etc
Work Experience
Associate, Insights & Analytics
Jan 2024 - Present
Blue Matter Consulting
Data Analyst
Sep 2020 - Jun 2022
IQVIA (159 Solutions Inc.)
- Provided analytical support by designing a launch tracker and creating dashboard reports for the client's new drug launch
- Utilized Alteryx, MS Excel, and Tableau to process EDI 852, 867 sales, and EPI data to generate weekly stakeholder deliverables
- Conducted ad-hoc analysis using HCOS, DDD data, and PLD to analyze patients and sales across demographics and geographies
- Collaborated with cross-functional teams to migrate database from Teradata & Azure to AWS S3 by conducting QC and sanity checks
- Recognized with ‘Ovation Award’ for managing offshore work stream that included organizing daily catch ups, facilitating client engagement & stakeholder deliverables as per client needs
Portfolio
- All
- Tech
- Content
Forecasting Bike Inventory for Citibike
PySpark, Databricks, MLFlow, ETL, EDA
Developed an application for Citibike having 50k+ daily users to ensure availability of bikes and empty docks at the station in New York City. Implemented ETL pipeline with Spark streaming, performed EDA, built optimized forecasting model for net bike change, and fine-tuned hyperparameters for enhanced accuracy.
Analyzing Political Interest of Indian American
Python, Topic modelling, Sentiment Analysis, Tableau, Twitter API
Analyzed tweets from 2.8k unique Indian Americans to identify their political trends and biases ahead of 2024 presidential elections. Designed visuals illustrating prevalent topics and sentiment within the community based on party affiliations and geographic locations
Dynamic QA Generator for Research Papers
Python, Large Language Models, Data Preparation, OpenAI API
Developed a QA model to aid efficient comprehension of research papers by summarizing relevant information in the form of Q&A. Fine-tuned T5 models on OpenAI-modified QASPER dataset with 1.5k+ papers for question generation and answer generation task
COVID-19 Cases Prediction in Ohio
Ensemble Learning, Feature Engineering, Model Evaluation
Built ML model to predict COVID-19 cases using Ohio county's time series data and analyzed tweet-based social awareness impact. Achieved top 30% ranking with an impressive R2 score of 0.89 in the Kaggle competition
Classification of Tweets from Northern Europe
Python, Scikit Learn, Support Vector Classifier, Feature Engineering
Built a classification ML model to predict political polarity of multi-lingual text using 500k+ tweets data with an accuracy of 79%. Implemented lemmatization, POS tagging, CountVectorizer, TF-IDF Transformer for effective text cleaning and feature engineering.
Website Development : Powerslam
Webflow
Contact
I would love to hear from you! Whether you have questions, collaboration opportunities, or just want to connect, please feel free to reach out over mail or any social media platforms. If you are around the location below, I would be open to meeting!
Location:
South San Francisco, California