Power BI Cleaning and Transforming Data
Data cleaning is the process of finding and fixing problems in your data before you build any reports. Dirty data produces wrong charts and misleading insights. A sales report that shows incorrect totals because of formatting errors in a price column will lead to wrong decisions. Cleaning your data first prevents these problems.
Why Data Needs Cleaning
Data comes from many different people and systems. A salesperson enters a customer name as "john smith." Another enters it as "John Smith." A third types "JOHN SMITH." To a human, these all mean the same person. To Power BI, they are three different customers. Your sales report will split one customer's data into three separate entries.
Think of it like labeling boxes in a warehouse. If one box says "Red Shoes," another says "red shoes," and a third says "RED SHOES," a sorting machine treats them as three separate products. A person knows they are the same, but the machine does not. Power Query helps you standardize labels so the machine — Power BI — handles data correctly.
Handling Blank and Null Values
Blank cells appear in almost every real dataset. A customer record might be missing a phone number. A sales order might have no discount value. These blanks can cause problems in calculations.
Removing Rows with Blanks
In Power Query, click the arrow on a column header. In the filter dropdown, uncheck the "null" option. This removes all rows where that column is blank. Use this approach when a blank in a key column means the row is incomplete and not useful for analysis.
Replacing Blanks with a Default Value
Right-click the column header and select Replace Values. In the "Value to Find" box, leave it empty. In the "Replace With" box, enter your default value — such as 0 for a number column or "Unknown" for a text column. This keeps the row but fills the blank with something meaningful.
Removing Duplicates
Duplicate rows cause double-counting in reports. If a product record appears twice in your Products table, every calculation involving that product counts it twice.
To remove duplicates in Power Query, right-click the column or select multiple columns that together should be unique, then click Remove Duplicates. Power BI keeps the first occurrence of each unique combination and removes the rest.
For example, a Customer table should have one row per customer. If Customer ID 1005 appears three times, that is duplicate data. Removing duplicates on the Customer ID column keeps only the first row for ID 1005 and removes the other two.
Text Cleaning Operations
Text data often has hidden problems — extra spaces before or after words, inconsistent capitalization, or unwanted characters. Power Query has dedicated functions to fix all of these.
Trim
Select the text column, right-click the header, go to Transform, and choose Trim. This removes extra spaces from the beginning and end of every text value in that column. " Delhi " becomes "Delhi."
Clean
Also found under Transform, the Clean option removes non-printable characters — invisible characters that sometimes sneak into data exported from other systems and cause matching failures.
Change Case
Under Transform, you can set text to UPPERCASE, lowercase, or Capitalize Each Word. Standardizing text case ensures that "new delhi," "New Delhi," and "NEW DELHI" all become "New Delhi" — one consistent value.
Splitting Columns
Sometimes one column contains information that should be in separate columns. A "Full Name" column with "Riya Sharma" should become two columns — "First Name" with "Riya" and "Last Name" with "Sharma."
Right-click the column header and choose Split Column. Power Query asks how you want to split — by a specific character (like a space or comma), by number of characters, or by position. After splitting, two new columns appear with the name automatically adjusted to include "_1" and "_2" at the end. Rename them to something meaningful.
Merging Columns
The opposite of splitting — sometimes data arrives with information in too many columns that should be combined. For example, a "City" column and a "State" column might need to combine into one "Location" column showing "Pune, Maharashtra."
Select both columns by holding Ctrl and clicking each header. Right-click and choose Merge Columns. A dialog asks for a separator character. Type a comma and space. Power Query combines both columns with that separator between them.
Replacing Values
Replace Values is one of the most useful tools in Power Query. Use it to standardize inconsistent entries. If a Region column has "North," "north," "NORTH," and "N" all meaning the same region, use Replace Values to change all variations to "North."
You can also use it to fix common data entry errors — for example, replacing "Mubai" with "Mumbai" across thousands of rows in one step.
Changing Data Types
A column's data type tells Power BI what kind of data it contains. If a date column is stored as text, Power BI cannot calculate date differences or filter by year. If a price column is stored as text, Power BI cannot add up totals.
Click the data type icon on the left side of the column header. A dropdown shows available types. Select the correct type. If any values in the column cannot be converted to the new type, Power Query shows them as errors that you can then investigate and fix.
Filtering Rows to Keep Only Relevant Data
Not all rows in a dataset belong in your analysis. If you analyze completed orders only, filter out cancelled and pending orders. If you analyze data for the current financial year, filter out rows with dates outside that range.
Click the dropdown arrow on any column header to apply filters. You can filter by specific values, by ranges (greater than, less than), by date conditions (this year, last month, before a specific date), or by text conditions (begins with, contains, ends with).
Key Points
- Clean data produces accurate reports — always clean before building visuals.
- Use Trim and Clean to remove spaces and invisible characters from text columns.
- Replace Values standardizes inconsistent entries across an entire column in one step.
- Remove Duplicates on key columns prevents double-counting in calculations.
- Always verify data types — dates must be Date type, numbers must be Number type.
