How to Merge Multiple CSV Files into One Big CSV File using Python

What will you learn?

In this tutorial, you will master the art of merging multiple CSV files into a single large CSV file using Python. This skill is essential when dealing with data from various sources that need to be consolidated for analysis.

Introduction to the Problem and Solution

When working with data scattered across several CSV files, consolidating them into one file simplifies the analysis process. By combining multiple small CSV files into a single large CSV file, you can efficiently handle and analyze the data as a whole.

To tackle this challenge, we will create a Python script that reads each small CSV file, extracts its contents, and appends them to a bigger CSV file.

Code

import os
import glob
import pandas as pd

# Path to directory containing small csv files
input_folder = 'path/to/csv/files/'

# Combine all csv files in the folder into one dataframe
all_files = glob.glob(os.path.join(input_folder, "*.csv"))
df_from_each_file = (pd.read_csv(f) for f in all_files)
big_df = pd.concat(df_from_each_file, ignore_index=True)

# Save the combined dataframe to a new csv file
output_file = 'path/to/output/big_file.csv'
big_df.to_csv(output_file, index=False)

# Visit our website PythonHelpDesk.com for more tips and tricks!

# Copyright PHD

Explanation

To merge multiple CSV files into one big file in Python: 1. Import necessary libraries like os, glob, and pandas. 2. Specify the path of the directory containing small CSV files. 3. Use glob to retrieve a list of filenames matching a specific pattern. 4. Read each small CSV file as a Pandas DataFrame using list comprehension. 5. Concatenate all DataFrames into one large DataFrame using pd.concat(). 6. Save the combined DataFrame as a new CSV file.

    How can I specify different column names while merging?

    You can define column names when concatenating DataFrames or rename columns before merging.

    Can I merge large-sized CSV files with this method?

    Yes, you can merge large-sized files; ensure your system has sufficient memory for processing.

    Is it possible to merge Excel (.xlsx) files instead of CSVs?

    Yes, modify the code by using read_excel() instead of read_csv() from Pandas library.

    How do I handle duplicate headers when combining multiple files?

    Choose one header row from any input dataset or skip headers during concatenation based on your needs.

    Can I merge non-CSV text-based data formats like JSON or XML?

    Yes, utilize appropriate libraries (e.g., json module for JSON) based on your input data format before merging.

    Conclusion

    Consolidating smaller datasets from separate CSV files streamlines data analysis by creating a comprehensive dataset for further examination or manipulation. Mastering these Python techniques enhances efficiency in managing such tasks, ultimately saving time and effort.

    Leave a Comment