How to Color DataFrame Cells Based on Percentiles

What will you learn?

In this tutorial, you will master the art of visually enhancing your data analysis by coloring DataFrame cells based on their percentile rankings. By the end of this guide, you will have the skills to apply conditional formatting using pandas, making your data presentation more engaging and insightful.

Introduction to Problem and Solution

When dealing with extensive datasets, it can be challenging to pinpoint crucial insights solely from raw numbers. To tackle this issue effectively, color coding based on percentile ranks comes into play. This technique not only highlights exceptional values within the dataset but also facilitates a quick visual identification of trends and outliers. For instance, differentiating the top 10% values from the bottom 10% through distinctive colors can significantly draw attention to key data points.

The solution lies in leveraging pandas’ style property along with its applymap function. By calculating percentiles within the DataFrame and applying a color gradient or discrete scheme, we can visually represent each cell’s position within those percentiles. This approach greatly enhances data interpretation and readability at a glance.

Code

import pandas as pd
import numpy as np

# Creating a Sample DataFrame
np.random.seed(24)
df = pd.DataFrame(np.random.randn(5, 3), columns=['A', 'B', 'C'])

# Function for Coloring Cells based on Percentile Rank
def percentile_color(df):
    colors = ['background-color: yellow' if (val >= df.quantile(0.9).min()) else 'background-color: lightgreen' if (val <= df.quantile(0.1).max()) else '' for val in df.values.flatten()]
    return pd.DataFrame(colors.reshape(df.shape), index=df.index, columns=df.columns)

styled_df = df.style.apply(percentile_color)
styled_df

# Copyright PHD

Explanation

  1. Creating a Sample DataFrame: We begin by generating a sample DataFrame called df containing random numbers for demonstration purposes.
  2. Defining Our Coloring Function: The percentile_color() function takes a DataFrame as input and computes two essential percentiles – 90th (top 10%) and 10th (bottom 10%). It then assigns background colors based on each value’s relationship with these percentiles.
  3. Applying Styling: Using .style.apply() on our original dataframe df, we incorporate our custom coloring function.
  4. Output: The result is an enhanced version of our original dataframe where cell backgrounds reflect their relative positions within specified percentiles.
  1. How can I customize the colors used for styling?

  2. You can modify ‘background-color: yellow’ and ‘background-color: lightgreen’ inside the percentile_color function with your preferred CSS color codes.

  3. Can I retain this styling when exporting my dataframe?

  4. Certainly! After applying styles, use .to_excel(‘filename.xlsx’, engine=’openpyxl’). Ensure you have installed openpyxl (pip install openpyxl). Note that CSV files do not support styling information.

  5. Is it possible to highlight specific columns exclusively?

  6. Absolutely! Adjust your coloring functions accordingly or utilize pandas .loc[:, [‘column_names’]] selector before applying styles.

  7. Can I specify different quantiles other than just top & bottom 10%?

  8. Yes, modify calls inside percentile_color. For instance, replace .quantile(0.9) with any desired quantile such as .quantile([0.25,.75]).

  9. Do all cells require unique style strings returned individually?

  10. Nope! Unaltered cells return an empty string ”, maintaining consistency across styled elements while leaving them unaffected.

  11. How does Pandas compute these quantiles?

  12. Pandas employs linear interpolation by default but offers various methods like lower/nearest/higher/midpoint which can be adjusted through parameters in .quantil() method call.

Conclusion

By incorporating color-based percentile ranking in DataFrame cells, you gain an intuitive method to swiftly identify significant data points within vast datasets. This visual enhancement not only elevates aesthetic appeal but also fosters better comprehension during analytical endeavors. Seamlessly merging numerical analysis with human perceptual abilities ensures both visual appeal and analytical depth are harmoniously balanced throughout the data exploration process.

Leave a Comment