Context Slice

CSV Transformation Best Practices

CSV files are simple but often messy. Good data transformation is about understanding the data first, then applying changes systematically.

Understanding the Data First

Before any transformation:

How many rows and columns?
What are the column names?
What data types should each column be?
Are there obvious issues (blanks, duplicates, inconsistencies)?
What's the desired end state?

Common Cleaning Operations

Missing Data

Identify: Which columns have blanks/nulls?
Options:
- Remove rows with missing data
- Fill with default value
- Fill with calculated value (mean, median)
- Leave as-is (if downstream can handle)

Duplicates

Define: What makes a row duplicate? (all columns? subset?)
Options:
- Remove all duplicates
- Keep first/last occurrence
- Merge duplicate rows

Formatting Issues

Whitespace: Trim leading/trailing spaces
Case: Standardize to upper/lower/title case
Dates: Convert to consistent format
Numbers: Remove currency symbols, standardize decimals

Data Types

Convert strings to numbers where appropriate
Parse dates from text
Boolean standardization (yes/no → true/false)

Common Transform Operations

Column Operations

Rename: Change column headers
Reorder: Rearrange column sequence
Add: Create new columns (calculated or constant)
Remove: Drop unnecessary columns
Split: Break one column into multiple (e.g., "John Smith" → "John", "Smith")
Combine: Merge multiple columns into one

Row Operations

Filter: Keep rows matching criteria
Sort: Order by one or more columns
Sample: Take subset of rows
Aggregate: Group and summarize (count, sum, average)

Value Operations

Replace: Find and replace values
Map: Transform values using lookup
Calculate: Create derived values

Merge Operations

Join Types

Inner: Only rows that match in both files
Left: All rows from first file, matching from second
Right: All rows from second file, matching from first
Outer: All rows from both files

Key Matching

Single column: Simple match on one field
Multiple columns: Composite key matching
Fuzzy matching: When exact match isn't possible

Common Issues

Duplicate keys: What happens when one file has multiple matches?
Missing keys: How to handle non-matches?
Column name conflicts: Both files have columns with same name

Format Conversions

CSV to JSON

name,age,city
John,30,NYC
→
[{"name":"John","age":"30","city":"NYC"}]

CSV to Markdown Table

| name | age | city |
|------|-----|------|
| John | 30  | NYC  |

Encoding

UTF-8 is default and preferred
Watch for encoding issues with special characters
Excel sometimes creates files with different encodings

Validation

After transformation, verify:

Row count (expected vs actual)
Column count
Sample values look correct
No unexpected nulls introduced
Data types are correct

Output Options

Full Data

Complete transformed dataset
Suitable for small to medium files

Summary

First N rows as preview
Row/column counts
Basic statistics

Sample

Random subset for verification
Useful for large files

Best Practices

Preview first: Look at sample before transforming
Document changes: Track what was done
Preserve original: Don't modify source files
Validate output: Check results make sense
Handle errors: What to do with problematic rows

Common Issues

Issue	Solution
Comma in values	Use quoted strings
Newlines in values	Use proper escaping
Different delimiters	Detect or specify
Header issues	First row is/isn't header
Encoding problems	Convert to UTF-8
Large files	Process in chunks

Slice Info

Description

Best practices for cleaning, transforming, and merging CSV data

Tokens

873

Used By

CSV Transformer skill

Clean CSV Data task

Merge CSV Files task

Show 1 more

slice:data.csv.guide