Whether you’re Data Analyst, Data Engineer or Data Scientist there’s just no running away from data cleaning. It’s the essential skill that you need if you want to make a career in data. I am writing this blog to share 5 unique data formatting challenge that I faced in my last project working for my client.
1. Creating tables from a nested json files.
2. Extracting the month name directly from month number.
3. Masking the confidential columns in your data.
4. Concatenating excel sheets with sheet names as alias for identification.
5. Formatting a complex dictionary to a required…
I’ve received a huge number of questions on LinkedIn enquiring about making a transition from the non-tech to the Data Science industry. I graduated as a Civil Engineer and was able to make this transition. Also, I have coached more than 500 students both offline and online, so I feel I am in a position to talk about this.
1. Learning Timeline
2. Data Science skillset
4. How to combat the inexperience
5. Building your own brand
6. Roadmap I recommend
7. Best Resouces
The first question that I am always asked is how much time does it…
Often in our analysis we tend to group similar objects together and then apply different rules and validation on these groups instead of separately dealing with each single point.
In Machine Learning world this activity is called as clustering. There are many algorithms which are used for clustering K-Means, DBSCAN, Hierarchical Clustering etc. but none of them are efficient if you have both numerical and categorical data in your dataset. Gower’s Measure aims to solve this problem.
How do you measure similarity?
How will you differentiate between customer A who has a bank balance of 10000 $, average credit card…
Ever wondered how spellchecks and auto-corrections in your mobile phones bails you out by automatically suggesting the words that you were just about to type?
I’ve come across so many amazing functionalities in Python but the fuzzywuzzy package in particular caught my interest. So i decided to write a blog about this topic and share it to wider network.
Down below are the sub topics of this blog :
The concept of Levenshtein Distance sometimes also called as Minimum Edit distance is a popular metric used to measure the distance between two…
Data Scientist at Deloitte