python pandas apply and merge

Data transformation is a crucial step in any data analysis pipeline. It involves modifying and restructuring data to facilitate further analysis or produce more meaningful insights. In this tutorial, we will explore the various techniques for applying functions and mapping in Pandas, which are essential for transforming data in DataFrames.

Applying Functions with apply(), applymap(), and map()

Pandas provides three main methods for applying functions to DataFrames and Series: apply(), applymap(), and map(). Each method serves a specific purpose and works with different types of data.

Using apply()

The apply() method can be used on both Series and DataFrames. When used on a Series, it applies a function to each element in the Series. When used on a DataFrame, it applies a function along a specified axis (either rows or columns).

For example, let’s create a simple DataFrame:

import pandas as pd

data = {
    'Fruit': ['Apple', 'Banana', 'Orange', 'Grape', 'Watermelon'],
    'Price': [1.2, 0.5, 0.75, 2.0, 3.0],
    'Quantity': [50, 100, 75, 30, 10],
    'Discount': [0.1, 0.05, 0.2, 0.15, 0.25]
}

df = pd.DataFrame(data)
print(df)

Now, let’s apply a function that discount each element in column ‘Price’:

df['Discounted_Price'] = df['Price'].apply(lambda x: x * 0.9)
print(df)
        Fruit  Price  Quantity  Discount  Discounted_Price
0       Apple   1.20        50      0.10             1.080
1      Banana   0.50       100      0.05             0.450
2      Orange   0.75        75      0.20             0.675
3       Grape   2.00        30      0.15             1.800
4  Watermelon   3.00        10      0.25             2.700

To apply a function along the rows (axis=1), you can do the following:

numeric_columns = ['Price', 'Quantity', 'Discount']
row_sums = df[numeric_columns].apply(lambda x: x.sum(), axis=1)
print(row_sums)
0     51.30
1    100.55
2     75.95
3     32.15
4     13.25
dtype: float64

This will return a Series with the sum of each numeric row.

Check out our tutorial on selecting and filtering data in Pandas to learn more about working with DataFrames and Series.

Using applymap()

The applymap() method is used to apply a function element-wise to every element in a DataFrame. This method is particularly useful when you need to apply a transformation to the entire DataFrame.

Let’s apply a function that discount ‘Price’, ‘Discount’ element in the DataFrame:

df_discounted = df[['Price', 'Discount']].applymap(lambda x: x * 0.9)
print(df_discounted)
   Price  Discount
0  1.080     0.090
1  0.450     0.045
2  0.675     0.180
3  1.800     0.135
4  2.700     0.225

Using map()

The map() method is used to apply a function or a mapping (dictionary, Series, or function) to each element in a Pandas Series. It is similar to the apply() method but works only on Series.

Here’s an example of using map() to replace the elements in a Series:

fruit_colors = {
    'Apple': 'Red',
    'Banana': 'Yellow',
    'Orange': 'Orange',
    'Grape': 'Purple',
    'Watermelon': 'Green'
}
df['Color'] = df['Fruit'].map(fruit_colors)
print(df)
        Fruit  Price  Quantity  Discount  Discounted_Price   Color
0       Apple   1.20        50      0.10             1.080     Red
1      Banana   0.50       100      0.05             0.450  Yello
2      Orange   0.75        75      0.20             0.675  Orange
3       Grape   2.00        30      0.15             1.800  Purple
4  Watermelon   3.00        10      0.25             2.700   Green

In this example, we’re using the map() function to map the fruit names in the ‘Fruit’ column to their corresponding colors. We create a dictionary called fruit_colors, where the keys are the fruit names and the values are the fruit colors. We then apply the map() function to the ‘Fruit’ column in the DataFrame, using the fruit_colors dictionary for the mapping. The result is a new column ‘Color’ in the DataFrame, which contains the corresponding color for each fruit.

Lambda Functions in Pandas

Lambda functions are anonymous functions in Python that can be defined using the lambda keyword. They are particularly useful in Pandas when you need to apply a simple function to a Series or DataFrame without defining a separate function.

Here’s an example of using a lambda function to discount ‘Price’ element in a Series:

df['Discounted_Price'] = df['Price'].apply(lambda x: x * 0.9)
print(df)

You can also use lambda functions with conditional expressions. We’re using the apply() function along with a lambda function to create a new column in the DataFrame called ‘Price_Rounded’. The lambda function checks each value in the ‘Price’ column, and if the value is less than 1, it rounds the value to the nearest integer. Otherwise, the value remains unchanged. The resulting ‘Price_Rounded’ column contains the modified prices according to the specified condition.

df['Price_Rounded'] = df['Price'].apply(lambda x: round(x) if x < 1 else x)
print(df)

Lambda functions can be combined with other Pandas methods to perform more complex operations. We use a lambda function to square the elements in the ‘Price’ column of the DataFrame. We then apply the mean() function to calculate the average of the squared prices.

avg_squared = df['Price'].apply(lambda x: x**2).mean()
print(avg_squared)

Vectorized Operations

In addition to the apply(), applymap(), and map() methods, Pandas supports vectorized operations, which enable you to perform element-wise operations on Series and DataFrames without using explicit loops or functions. Vectorized operations in Pandas are built on top of NumPy’s array operations, providing high performance and ease of use.

Here’s an example of multiplying two columns in a DataFrame using a vectorized operation:

df['Total_Price'] = df['Price'] * df['Quantity']
print(df)

You can also perform arithmetic operations, such as addition, subtraction, multiplication, and division, directly on DataFrames and Series:

df['Total_Price_with_Discount'] = df['Price'] * (1 - df['Discount']) * df['Quantity']
print(df)

Learn more about working with DataFrames in our comprehensive guide on understanding Pandas DataFrames.

Conclusion

In this tutorial, we explored the various techniques for applying functions and mapping in Pandas, which are essential for transforming data in DataFrames. We learned how to use the apply(), applymap(), and map() methods, as well as how to work with lambda functions and perform vectorized operations.

With this knowledge, you can now effectively transform and manipulate your data using Pandas. To further expand your Pandas skillset, consider diving into topics like sorting, renaming, and merging DataFrames, grouping and aggregating data using GroupBy, and handling missing data in Pandas.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *