In this tutorial, we will explore Python Pandas DataFrames in-depth, covering essential concepts and techniques for efficient data manipulation and analysis. By understanding the core components of DataFrames, you’ll be able to unlock the full potential of Pandas in your data analysis projects.
Introduction to DataFrames

DataFrames are flexible data structures in the Pandas library used for handling and analyzing two-dimensional data, where information is organized in rows and columns. A Pandas DataFrame can store various data types and provide essential functionality for data science tasks.
To work with DataFrames, first import the Pandas library:
import pandas as pd
Throughout this tutorial, we will cover Python dataframe examples and explore various dataframe features to help you become proficient in working with DataFrames.

Creating DataFrames

There are several ways to create a DataFrame in Pandas:
- From a Python dictionary
- From a NumPy array
- From a list of dictionaries
- From a list of lists
- From a CSV file
- From other DataFrames
In this section, we’ll explore different methods of creating DataFrames and provide Python pandas dataframe examples.
From a Python Dictionary
To create a DataFrame from a dictionary, each key-value pair represents a column name and its corresponding data. The data can be a list, NumPy array, or other iterable.
import pandas as pd
data = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"]
}
df = pd.DataFrame(data)
print(df)
The output will be:
Name Auditory Creator 0 Seter Design 10000 Vasiukou 1 Seter Develop 100000 Hontar 2 Seter Mngmt 50000 Hontar
From a NumPy Array
You can create a DataFrame from a NumPy array. To do this, you need to specify the index, columns, and the data itself.
import pandas as pd
import numpy as np
data = np.random.randint(0, 100, size=(3, 3))
index = ["Row1", "Row2", "Row3"]
columns = ["Column1", "Column2", "Column3"]
df = pd.DataFrame(data, index=index, columns=columns)
print(df)
The output will look something like this:
Column1 Column2 Column3 Row1 56 36 94 Row2 89 92 61 Row3 52 10 75
From a List of Dictionaries
A list of dictionaries can be used to create a DataFrame, with each dictionary representing a row of data. The keys in the dictionaries will be used as column names.
import pandas as pd
data = [
{"total_bill": 16.99, "smoker": "No", "time": "Dinner"},
{"total_bill": 14.29, "smoker": "Yes", "time": "Lunch"},
{"total_bill": 26.99, "smoker": "No", "time": "Dinner"},
]
df = pd.DataFrame(data)
print(df)
The output will be:
total_bill smoker time 0 16.99 No Dinner 1 14.29 Yes Lunch 2 26.99 No Dinner
From a List of Lists
Another way to create a DataFrame is from a list of lists, where each inner list represents a row of data.
import pandas as pd
data = [
["Sun", "Dinner", 2],
["Mon","Dinner", 4],
["Tue", "Lunch", 1]
]
column_names = ["day", "time", "size"]
df = pd.DataFrame(data, columns=column_names)
print(df)
The output will be:
day time size 0 Sun Dinner 2 1 Mon Dinner 4 2 Tue Lunch 1
From a CSV File
You can create a DataFrame from a CSV file using the read_csv
function, as demonstrated in our previous article, Effortlessly Import Data with Pandas: A Guide to CSV.
From Other DataFrames
To create dataframe from dataframe, you can use the copy()
method, which creates a new DataFrame with the same data and structure as the original. This is useful when you want to create a separate DataFrame for further manipulation without affecting the original data.
import pandas as pd
data = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"]
}
original_df = pd.DataFrame(data)
new_df = original_df.copy()
print(new_df)
The output will be:
Name Auditory Creator 0 Seter Design 10000 Vasiukou 1 Seter Develop 100000 Hontar 2 Seter Mngmt 50000 Hontar
Selecting Data in DataFrames
Pandas DataFrames provide various ways to access dataframe columns, row labels, and column labels. You can also select multiple rows or filter the data based on certain conditions.
You can learn more about selecting data in DataFrames in our dedicated article.
Selecting Columns
To select a single column, you can use the column name in square brackets or with a dot notation.
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Carol"],
"Age": [25, 30, 35],
"Gender": ["F", "M", "F"]
}
df = pd.DataFrame(data)
# Using square brackets
name_column = df["Name"]
# Using dot notation
age_column = df.Age
print("Name column:\n", name_column)
print("Age column:\n", age_column)
The output will be:
Name column: 0 Seter Design 1 Seter Develop 2 Seter Mngmt Name: Name, dtype: object Age column: 0 10000 1 100000 2 50000 Name: Auditory, dtype: int64
To select multiple columns, pass a list of column names inside the square brackets.
import pandas as pd
data = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"]
}
df = pd.DataFrame(data)
selected_columns = df[["Name", "Creator"]]
print(selected_columns)
The output will be:
Name Creator 0 Seter Design Vasiukou 1 Seter Develop Hontar 2 Seter Mngmt Hontar
Selecting Rows
To select rows based on their index, you can use the iloc
and loc
attributes.
iloc
is used to access rows by integer index, while loc
is used to access rows by label-based index.
import pandas as pd
data = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"]
}
df = pd.DataFrame(data)
# Selecting a row by integer index using iloc
row1 = df.iloc[1]
# Selecting a row by label-based index using loc
row2 = df.loc[2]
print("Row 1:\n", row1)
print("Row 2:\n", row2)
The output will be:
Row 1: Name Seter Develop Auditory 100000 Creator Hontar Name: 1, dtype: object Row 2: Name Seter Mngmt Auditory 50000 Creator Hontar Name: 2, dtype: object
Selecting Data Based on Conditions
You can select data from a DataFrame based on conditions. This is useful when you want to filter data based on specific criteria.
import pandas as pd
data = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"]
}
df = pd.DataFrame(data)
# Selecting rows where Auditory is greater than 20000
auditory_greater_than_20k = df[df["Auditory"] > 20000]
print(auditory_greater_than_20k)
The output will be:
Name Auditory Creator 1 Seter Develop 100000 Hontar 2 Seter Mngmt 50000 Hontar
You can also use multiple conditions by combining them with &
(and) or |
(or) operators.
import pandas as pd
data = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"],
"Gender": ["M", "M", "F"]
}
df = pd.DataFrame(data)
# Selecting female rows where Auditory is greater than 30000
female_auditory_greater_than_30k = df[(df["Auditory"] > 30000) & (df["Gender"] == "F")]
print(female_auditory_greater_than_30k)
The output will be:
Name Auditory Creator Gender 2 Seter Mngmt 50000 Hontar F
Modifying DataFrames
When working with DataFrames, you may encounter missing values or missing data. You can handle these cases by dropping or filling them with appropriate values.
We write an article about data manipulation, check this out.
Adding Columns
To add a new column to a DataFrame, simply assign a new column name with the desired values.
import pandas as pd
data = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"],
}
df = pd.DataFrame(data)
# Adding a new column 'Gender'
df["Gender"] = ["M", "M", "F"]
print(df)
The output will be:
Name Auditory Creator Gender 0 Seter Design 10000 Vasiukou M 1 Seter Develop 100000 Hontar M 2 Seter Mngmt 50000 Hontar F
Updating Columns
To update an existing column, simply reassign the values for that column.
import pandas as pd
data = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"],
}
df = pd.DataFrame(data)
# Updating the 'Auditory' column by adding 50000 to each value
df["Auditory"] = df["Auditory"] + 50000
print(df)
The output will be:
Name Auditory Creator 0 Seter Design 60000 Vasiukou 1 Seter Develop 150000 Hontar 2 Seter Mngmt 100000 Hontar
Deleting Columns
To delete a column from a DataFrame, you can use the drop
method with the axis
parameter set to 1
.
import pandas as pd
data = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"],
}
df = pd.DataFrame(data)
# Deleting the 'Creator' column
df = df.drop("Creator", axis=1)
print(df)
The output will be:
Name Auditory 0 Seter Design 10000 1 Seter Develop 100000 2 Seter Mngmt 50000
Adding Rows
To add rows to a DataFrame, you can use the append
(depricated) and concat
method, which takes a dictionary or another DataFrame as input.
import pandas as pd
data = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"],
}
df = pd.DataFrame(data)
# Adding a new row using a dictionary
new_row = {"Name": "Seter AI", "Auditory": 40000, "Creator": "Smith"}
df = pd.concat([df, pd.DataFrame(new_row, index=[0])], ignore_index=True)
print(df)
The output will be:
Name Auditory Creator 0 Seter Design 10000 Vasiukou 1 Seter Develop 100000 Hontar 2 Seter Mngmt 50000 Hontar 3 Seter AI 40000 Smith <ipython-input-17-a51cf5d392cc>:13: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. df = df.append(new_row, ignore_index=True)
import pandas as pd
data = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"],
}
df = pd.DataFrame(data)
# Adding a new row using a dictionary
new_row = {"Name": "Seter AI", "Auditory": 40000, "Creator": "Smith"}
df = pd.concat([df, pd.DataFrame(new_row, index=[0])], ignore_index=True)
print(df)
The output will be:
Name Auditory Creator
0 Seter Design 10000 Vasiukou
1 Seter Develop 100000 Hontar
2 Seter Mngmt 50000 Hontar
3 Seter AI 40000 Smith
Updating Rows
To update a row in a DataFrame, you can use the loc
attribute with the desired index and reassign the values for that row.
import pandas as pd
data = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"],
}
df = pd.DataFrame(data)
# Updating the second row with new values
df.loc[1] = ["Seter AI", 40000, "Smith"]
print(df)
The output will be:
Name Auditory Creator 0 Seter Design 10000 Vasiukou 1 Seter AI 40000 Smith 2 Seter Mngmt 50000 Hontar
Deleting Rows
To delete a row from a DataFrame, you can use the drop
method with the desired index.
import pandas as pd
data = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"],
}
df = pd.DataFrame(data)
# Deleting the first row
df = df.drop(0)
print(df)
The output will be:
Name Auditory Creator 1 Seter Develop 100000 Hontar 2 Seter Mngmt 50000 Hontar
DataFrame Operations
Pandas DataFrames support various arithmetic operations and allow you to perform computations on data frame elements. You can also manipulate index values, find duplicate index values, and reshape the data by pivoting or melting. Check out our article for more information on sorting and manipulating Pandas DataFrames.
Sorting DataFrames
To sort a DataFrame, you can use the sort_values
method with the by
parameter specifying the column to sort by.
import pandas as pd
data = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"],
}
df = pd.DataFrame(data)
# Sorting by 'Auditory' in descending order
df_sorted = df.sort_values(by="Auditory", ascending=False)
print(df_sorted)
The output will be:
Name Auditory Creator 1 Seter Develop 100000 Hontar 2 Seter Mngmt 50000 Hontar 0 Seter Design 10000 Vasiukou
Aggregating DataFrames
To aggregate data in a DataFrame, you can use the groupby method with the desired columns to group by and the desired function to aggregate.
import pandas as pd
data = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"],
}
df = pd.DataFrame(data)
# Grouping by 'Creator' and aggregating by average 'Auditory'
df_grouped = df.groupby("Creator").agg({"Auditory": "mean"})
print(df_grouped)
The output will be:
Auditory Creator Hontar 75000.0 Vasiukou 10000.0
Merging DataFrames
To merge two or more DataFrames, you can use the merge
method with the desired columns to merge on and the desired type of merge.
import pandas as pd
data1 = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"],
}
data2 = {
"Name": ["Seter Design", "Seter Develop", "Seter AI"],
"City": ["New York", "Los Angeles", "Chicago"]
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
# Merging on 'Name' column
df_merged = pd.merge(df1, df2, on="Name")
print(df_merged)
The output will be:
Name Auditory Creator City 0 Seter Design 10000 Vasiukou New York 1 Seter Develop 100000 Hontar Los Angeles
Join DataFrames

import pandas as pd
data1 = {
"Name": ["Seter Design", "Seter Develop", "Seter Mngmt"],
"Auditory": [10000, 100000, 50000],
"Creator": ["Vasiukou", "Hontar", "Hontar"],
}
data2 = {
"Name": ["Seter Design", "Seter Develop", "Seter AI"],
"City": ["New York", "Los Angeles", "Chicago"]
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
# Joining on 'Name' column
df_joined = df1.set_index("Name").join(df2.set_index("Name"))
print(df_joined)
This code performs a join operation on two DataFrames based on the “Name” column. The resulting DataFrame contains all the columns from both input DataFrames, with rows matched based on the “Name” column.
DataFrame Methods:
FUNCTION | DESCRIPTION |
index() | Method returns index (row labels) of the DataFrame |
insert() | Method inserts a column into a DataFrame |
add() | Method returns addition of dataframe and other, element-wise (binary operator add) |
sub() | Method returns subtraction of dataframe and other, element-wise (binary operator sub) |
mul() | Method returns multiplication of dataframe and other, element-wise (binary operator mul) |
div() | Method returns floating division of dataframe and other, element-wise (binary operator truediv) |
unique() | Method extracts the unique values in the dataframe |
nunique() | Method returns count of the unique values in the dataframe |
value_counts() | Method counts the number of times each unique value occurs within the Series |
columns() | Method returns the column labels of the DataFrame |
axes() | Method returns a list representing the axes of the DataFrame |
isnull() | Method creates a Boolean Series for extracting rows with null values |
notnull() | Method creates a Boolean Series for extracting rows with non-null values |
between() | Method extracts rows where a column value falls in between a predefined range |
isin() | Method extracts rows from a DataFrame where a column value exists in a predefined collection |
dtypes() | Method returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns |
astype() | Method converts the data types in a Series |
values() | Method returns a Numpy representation of the DataFrame i.e. only the values in the DataFrame will be returned, the axes labels will be removed |
sort_values() | Method sorts a data frame in Ascending or Descending order of passed Column |
sort_index() | Method sorts the values in a DataFrame based on their index positions or labels instead of their values but sometimes a data frame is made out of two or more data frames and hence later index can be changed using this method |
loc[] | Method retrieves rows based on index label |
iloc[] | Method retrieves rows based on index position |
ix[] | Method retrieves DataFrame rows based on either index label or index position. This method combines the best features of the .loc[] and .iloc[] methods |
rename() | Method is called on a DataFrame to change the names of the index labels or column names |
columns() | Method is an alternative attribute to change the coloumn name |
drop() | Method is used to delete rows or columns from a DataFrame |
pop() | Method is used to delete rows or columns from a DataFrame |
sample() | Method pulls out a random sample of rows or columns from a DataFrame |
nsmallest() | Method pulls out the rows with the smallest values in a column |
nlargest() | Method pulls out the rows with the largest values in a column |
shape() | Method returns a tuple representing the dimensionality of the DataFrame |
ndim() | Method returns an ‘int’ representing the number of axes / array dimensions. Returns 1 if Series, otherwise returns 2 if DataFrame |
dropna() | Method allows the user to analyze and drop Rows/Columns with Null values in different ways |
fillna() | Method manages and let the user replace NaN values with some value of their own |
rank() | Values in a Series can be ranked in order with this method |
query() | Method is an alternate string-based syntax for extracting a subset from a DataFrame |
copy() | Method creates an independent copy of a pandas object |
duplicated() | Method creates a Boolean Series and uses it to extract rows that have duplicate values |
drop_duplicates() | Method is an alternative option to identifying duplicate rows and removing them through filtering |
set_index() | Method sets the DataFrame index (row labels) using one or more existing columns |
reset_index() | Method resets index of a Data Frame. This method sets a list of integer ranging from 0 to length of data as index |
where() | Method is used to check a Data Frame for one or more condition and return the result accordingly. By default, the rows not satisfying the condition are filled with NaN value |
Frequently asked questions
Conclusion
In this article, we have covered the basics of Pandas DataFrames, including creating, selecting, modifying, and aggregating data. We also discussed common DataFrame operations, such as sorting and merging. With these skills, you are ready to work with DataFrames in Pandas and explore the vast array of data manipulation and analysis techniques available in Python.