python pandas csv

Before diving into this tutorial, make sure you’ve gone through the Getting Started with Python Pandas: Installation and Setup guide to install and set up Pandas on your system.

Working with Comma Separated Values (CSV) files is a common task in data analysis, and Python Pandas simplifies this process. In this article, we’ll discuss how to read csv file in python pandas and how to write csv file in python using pandas. By the end of this tutorial, you’ll be able to python pandas read csv file and manipulate it as a DataFrame with ease.

Reading CSV Files with Pandas

Pandas offers a simple function, read_csv(), to read csv file in python pandas. The basic syntax for python pandas read csv is as follows:

import pandas as pd
data = pd.read_csv("file_path.csv")

By executing the above code, you can read a CSV file and store its contents in a DataFrame. The file_path.csv is the path to your CSV file.

understanding dataframe

For example, let’s consider the following CSV file named tips.csv:

total_bill,tip,sex,smoker,day,time,size
16.99,1.01,Female,No,Sun,Dinner,2
10.34,1.66,Male,No,Sun,Dinner,3
21.01,3.5,Male,No,Sun,Dinner,3
23.68,3.31,Male,No,Sun,Dinner,2

To read this CSV file and store it as a DataFrame, use the following code:

import pandas as pd
data = pd.read_csv("tips.csv")
print(data)

The output will be:

     total_bill   tip     sex smoker   day    time  size
0         16.99  1.01  Female     No   Sun  Dinner     2
1         10.34  1.66    Male     No   Sun  Dinner     3
2         21.01  3.50    Male     No   Sun  Dinner     3
3         23.68  3.31    Male     No   Sun  Dinner     2
4         24.59  3.61  Female     No   Sun  Dinner     4
..          ...   ...     ...    ...   ...     ...   ...
239       29.03  5.92    Male     No   Sat  Dinner     3
240       27.18  2.00  Female    Yes   Sat  Dinner     2
241       22.67  2.00    Male    Yes   Sat  Dinner     2
242       17.82  1.75    Male     No   Sat  Dinner     2
243       18.78  3.00  Female     No  Thur  Dinner     2

[244 rows x 7 columns]

Customizing CSV Import Options

The read_csv function offers various parameters to customize the import process. Some common parameters include:

  • sep: Specify the delimiter used in the CSV file (default is ,).
  • header: Indicate the row number(s) to use as the column names (default is 0, meaning the first row).
  • index_col: Specify the column(s) to set as the index or row labels of the DataFrame.
  • skiprows: Skip a specified number of rows while reading the CSV file.
  • nrows: Read only a specified number of rows from the CSV file.
  • na_values: Specify additional strings to recognize as NaN values.
  • encoding: Specify the character encoding of the CSV file.
  • comment: Specify a character that marks the beginning of a comment in the CSV file. Any lines that begin with this character will be ignored by Pandas when reading the file.

Specify Separator

Let’s consider a CSV file named custom.csv with the following content:

Name|Age|Gender
Alice|25|F
# This is a comment
Bob|30|M
Carol|35|F

To read this CSV file with a custom delimiter, ignoring the commented row, you can use the following code:

import pandas as pd
data = pd.read_csv("custom.csv", sep="|", comment="#")
print(data)

The output will be:

    Name  Age Gender
0  Alice   25      F
1    Bob   30      M
2  Carol   35      F

Select Columns

Often, you might only be interested in specific columns of a CSV file. To select only certain columns, use the usecols parameter:

data = pd.read_csv("tips.csv", usecols=["total_bill", "tip"])
     total_bill   tip
0         16.99  1.01
1         10.34  1.66
2         21.01  3.50
3         23.68  3.31
4         24.59  3.61
..          ...   ...
239       29.03  5.92
240       27.18  2.00
241       22.67  2.00
242       17.82  1.75
243       18.78  3.00

Skip Columns and Rows

Pandas allow you to skip rows while reading a CSV file using the skiprows parameter. This is useful if you want to ignore header rows or any other rows that aren’t relevant for your analysis.

# Skipping the first row (header row)
data = pd.read_csv("tips.csv", skiprows=[1])
     total_bill   tip     sex smoker   day    time  size
0         10.34  1.66    Male     No   Sun  Dinner     3
1         21.01  3.50    Male     No   Sun  Dinner     3
2         23.68  3.31    Male     No   Sun  Dinner     2
3         24.59  3.61  Female     No   Sun  Dinner     4
4         25.29  4.71    Male     No   Sun  Dinner     4
..          ...   ...     ...    ...   ...     ...   ...
238       29.03  5.92    Male     No   Sat  Dinner     3
239       27.18  2.00  Female    Yes   Sat  Dinner     2
240       22.67  2.00    Male    Yes   Sat  Dinner     2
241       17.82  1.75    Male     No   Sat  Dinner     2
242       18.78  3.00  Female     No  Thur  Dinner     2

[243 rows x 7 columns]

Skipping multiple rows

data = pd.read_csv("example.csv", skiprows=[1, 3, 5])
     total_bill   tip     sex smoker   day    time  size
0         10.34  1.66    Male     No   Sun  Dinner     3
1         23.68  3.31    Male     No   Sun  Dinner     2
2         25.29  4.71    Male     No   Sun  Dinner     4
3          8.77  2.00    Male     No   Sun  Dinner     2
4         26.88  3.12    Male     No   Sun  Dinner     4
..          ...   ...     ...    ...   ...     ...   ...
236       29.03  5.92    Male     No   Sat  Dinner     3
237       27.18  2.00  Female    Yes   Sat  Dinner     2
238       22.67  2.00    Male    Yes   Sat  Dinner     2
239       17.82  1.75    Male     No   Sat  Dinner     2
240       18.78  3.00  Female     No  Thur  Dinner     2

[241 rows x 7 columns]

Specify Data Types

Sometimes, it’s necessary to specify the data types of the columns in a CSV file, especially when Pandas cannot infer the correct data types automatically. To do this, use the dtype parameter:

data = pd.read_csv("tips.csv", dtype={"size": "float64"})
     total_bill   tip     sex smoker   day    time  size
0         16.99  1.01  Female     No   Sun  Dinner   2.0
1         10.34  1.66    Male     No   Sun  Dinner   3.0
2         21.01  3.50    Male     No   Sun  Dinner   3.0
3         23.68  3.31    Male     No   Sun  Dinner   2.0
4         24.59  3.61  Female     No   Sun  Dinner   4.0
..          ...   ...     ...    ...   ...     ...   ...
239       29.03  5.92    Male     No   Sat  Dinner   3.0
240       27.18  2.00  Female    Yes   Sat  Dinner   2.0
241       22.67  2.00    Male    Yes   Sat  Dinner   2.0
242       17.82  1.75    Male     No   Sat  Dinner   2.0
243       18.78  3.00  Female     No  Thur  Dinner   2.0

Encoding Issues

Occasionally, you may encounter encoding issues when reading CSV files containing non-ASCII characters. To handle such issues, use the encoding parameter and specify the appropriate encoding:

data = pd.read_csv("tips.csv", encoding="utf-8")

Read CSV as String

In certain situations, you may need to read the entire CSV file as a string, for example, to preprocess the data before converting it into a DataFrame. You can achieve this using the following approach:

with open("example.csv", "r") as file:
    csv_string = file.read()

Read Multiple CSV Files

When dealing with multiple CSV files, you can read and concatenate them into a single DataFrame using the following technique:

import glob

file_list = glob.glob("path/to/files/*.csv")
data_frames = [pd.read_csv(file) for file in file_list]
combined_data = pd.concat(data_frames)

Handling CSV Headers

By default, Pandas assumes that the first row of the CSV file contains the column names. However, this might not always be the case. You can use the header parameter to handle such situations.

No Header Row

If your CSV file doesn’t have a header row, set the header parameter to None. For example, let’s consider a CSV file named no_header.csv with the following content:

Alice,25,F
Bob,30,M
Carol,35,F

To read this CSV file without a header, use the following code:

import pandas as pd

data = pd.read_csv("no_header.csv", header=None)
print(data)

The output will be:

       0   1  2
0  Alice  25  F
1    Bob  30  M
2  Carol  35  F

You can also provide custom column names using the names parameter:

import pandas as pd

column_names = ["Name", "Age", "Gender"]
data = pd.read_csv("no_header.csv", header=None, names=column_names)
print(data)

The output will be:

    Name  Age Gender
0  Alice   25      F
1    Bob   30      M
2  Carol   35      F

Writing DataFrames to CSV Files

To write csv file python pandas, you can use the to_csv function. The basic syntax for writing a DataFrame to a CSV file is as follows:

data.to_csv("output_file_path.csv", index=False)

The index parameter is set to False to exclude the index column from the output file. If you want to include the index column, set it to True or remove the index parameter.

Example

Let’s create a DataFrame and write it to a CSV file named output.csv:

import pandas as pd

data = pd.DataFrame({
    "Name": ["Alice", "Bob", "Carol"],
    "Age": [25, 30, 35],
    "Gender": ["F", "M", "F"]
})

data.to_csv("output.csv", index=False)

The resulting output.csv file will have the following content:

Name,Age,Gender
Alice,25,F
Bob,30,M
Carol,35,F

Loading Large CSV Files

For large CSV files, it’s more efficient to read the data in chunks. You can achieve this by using the chunksize parameter in the read_csv function. The chunksize parameter specifies the number of rows to read at a time.

Example

Let’s read a large CSV file named large.csv in chunks of 1000 rows:

import pandas as pd

chunksize = 1000
chunks = []

for chunk in pd.read_csv("large.csv", chunksize=chunksize):
    # Process the chunk (e.g., filter rows, aggregate data, etc.)
    chunks.append(chunk)

data = pd.concat(chunks)

In this example, we read the CSV file in chunks and process each chunk separately. Finally, we concatenate the chunks into a single DataFrame using the pd.concat function.

Conclusion

In this article, we explored how to read and write CSV files using Python Pandas. We discussed how to customize the import process, handle headers, and efficiently load large CSV files. By following the examples provided, you can now effortlessly read and write CSV files using Pandas for your data analysis tasks. Make sure to refer to the official Pandas documentation for more advanced use cases and additional features available in the library.

By mastering the techniques presented in this tutorial, you’ll be able to:

  • Read and write csv file in python using pandas
  • Customize CSV import options such as delimiter, header, index, and more
  • Handle CSV headers with different configurations
  • Load and process large CSV files in chunks for efficient memory usage
  • Create DataFrames from CSV files and save DataFrames to CSV files

With this knowledge, you can confidently utilize Python Pandas for your data manipulation and analysis tasks involving CSV files.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *