• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:17 mins read
You are currently viewing Pandas Get Unique Values in Column

You can get unique values in column/multiple columns from pandas DataFrame using unique() or Series.unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.

The unique () function removes all duplicate values on a column and returns a single value for multiple same values.

Note that Uniques are returned in order of appearance. if you want to sort, use sort() function to sort single or multiple columns of DataFrame.

Related: Find Duplicate Rows from Pandas DataFrame

1. Quick Examples of Getting Unique Values in Columns

If you are in a hurry, below are some quick examples of how to get unique values in a single column and multiple columns in DataFrame.


# Below are some quick examples

# Example 1: Find unique values of a column
print(df['Courses'].unique())
print(df.Courses.unique())

# Example 2: Convert to List
print(df.Courses.unique().tolist())

# Example 3: Unique values with drop_duplicates
df.Courses.drop_duplicates()

# Example 4: Using pandas.unique() to unique values in multiple columns
df2 = pd.unique(df[['Courses', 'Fee']].values.ravel('K'))

# Example 5: Using pandas.unique() to unique values
df2 = pd.unique(df[['Courses']].values.ravel())

# Example 6: Find the unique values in multiple columns using numpy.unique()
df2 = np.unique(df[['Courses', 'Duration']].values)

# Example 7: Use numpy.unique() to unique values in multiple columns 
column_values = df[['Courses', 'Duration']].values
df2 = np.unique(column_values)

# Example 8: Using Set() in pandas DataFrame
df2 = set(df.Courses.append(df.Fee).values)

# Example 9: Using set() method
df2 = set(df.Courses) | set(df.Fee)

# Example 10: To get unique values in one series/column
df2 = df['Courses'].unique()

# Example 11: Using pandas.concat to extend one column to multiple columns
df2 = pd.concat([df['Courses'],df['Duration'],df['Fee']]).unique()

# Example 12: Use Series.drop_duplicates() to get unique values
print(df.Courses.drop_duplicates())

Now, let’s create a DataFrame with duplicate values, execute these examples and validate the results. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


# Create DataFrame
import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","pandas","Python","Spark","pandas"],
    'Fee' :[20000,25000,22000,30000,22000,20000,30000],
    'Duration':['30days','40days','35days','50days','40days','30days','50days'],
    'Discount':[1000,2300,1200,2000,2300,1000,2000]
              }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

pandas unique values column

2. Pandas Get Unique Values in Column

Unique is also referred to as distinct, you can get unique values in the column using pandas Series.unique() function, since this function needs to call on the Series object, use df['column_name'] to get the unique values as a Series.

Syntax:


# Syntax of unique()
Series.unique(values)

Let’s see an example.


# Find unique values of a specified column
print("Get unique values from specified column:\n", df['Courses'].unique())

Yields Series object as output. This eliminates all duplicates and returns only unique values from the Courses column.

3. Find Unique Values in Multiple Columns

In case you want to get unique values on multiple columns of DataFrame use pandas.unique() function, using this you can also get unique values of a single column.

Syntax:


# Syntax
pandas.unique(values)

Let’s see an example. Since the unique() function takes values, you need to get the value of a column using df[columns_list].values.ravel().


# Using pandas.unique() to unique values in multiple columns
df2 = pd.unique(df[['Courses', 'Fee']].values.ravel())
print("Get unique values from multiple columns:\n",df2)

# Output:
# Get unique values from multiple columns
# ['Spark' 20000 'PySpark' 25000 'Python' 22000 'pandas' 30000]

If you want to get all unique values for one column and then the second column use the argument ‘K‘ to the ravel() function. The argument 'K' tells the method to flatten the array in the order of the elements. This can be significantly faster than using the method’s default ‘C‘ order.


# Using pandas.unique() to unique values in multiple columns
df2 = pd.unique(df[['Courses', 'Fee']].values.ravel('k'))
print("Get unique values from multiple columns:\n", df2)

# Output:
# Get unique values from multiple columns
# ['Spark' 'PySpark' 'Python' 'pandas' 20000 25000 22000 30000]

To get unique values of a single column.


# Using pandas.unique() to unique values
df2 = pd.unique(df[['Courses']].values.ravel())
print("Get unique values from specified column:\n",df2)

# Output:
# Get unique values from specified column:
# ['Spark' 'PySpark' 'Python' 'pandas']

4. Using Numpy.unique()

If you are using Numpy, use unique() method to eliminate duplicate values.


import numpy as np
# Find the unique values in multiple columns using numpy.unique()
df2 = np.unique(df[['Courses', 'Duration']].values)
print("Get unique values from specified columns:\n", df2)

# Use numpy.unique() to unique values in multiple columns 
column_values = df[['Courses', 'Duration']].values
df2 = np.unique(column_values)
print("Get unique values from multiple columns:\n", df2)

# Output:
# Get unique values from multiple columns:
# ['30days' '35days' '40days' '50days' 'PySpark' 'Python' 'Spark' 'pandas']

5. Using set() to Eliminate Duplicates

The set() function also removes all duplicate values and gets only unique values. We can use this set() function to get unique values from DataFrame single or multiple columns.


# Using Set() in pandas DataFrame
df2 = set(df.Courses.append(df.Fee).values)
print("Get unique values from multiple columns:\n", df2)

# Using set() method
df2 = set(df.Courses) | set(df.Fee)
print("Get unique values from multiple columns:\n", df2)

# Output:
# Get unique values from multiple columns:
# {20000, 25000, 'pandas', 30000, 22000, 'PySpark', 'Python', 'Spark'}

6. Using pandas.concat() and Unique() Methods

Using unique() and pandas.concat() combination to get unique values of multiple columns.


# Using pandas.concat to extend one column to multiple columns
df2 = pd.concat([df['Courses'],df['Duration'],df['Fee']]).unique()
print(f"Get Unique Values from three Columns: {df2}")

Yields below output.


# Output:
Get Unique Values from three Columns: ['Spark' 'PySpark' 'Python' 'pandas' '30days' '40days' '35days' '50days'
20000 25000 22000 30000]

7. Use Series.drop_duplicates()

Finally, you can get the unique values of a column using drop_duplicates() function of Series object. After dropping duplicates, it returns a Series object with unique values.


# Use Series.drop_duplicates() to get unique values
print("Get unique values from specified column:\n", df.Courses.drop_duplicates())

# Output:
Get unique values from specified column:
# 0      Spark
# 1    PySpark
# 2     Python
# 3     pandas
# Name: Courses, dtype: object

8. Complete Example of pandas Get Unique Values in Columns


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","pandas","Python","Spark","pandas"],
    'Fee' :[20000,25000,22000,30000,22000,20000,30000],
    'Duration':['30days','40days','35days','50days','40days','30days','50days'],
    'Discount':[1000,2300,1200,2000,2300,1000,2000]
              }
df = pd.DataFrame(technologies)
print(df)

# Find unique values of a column
print(df['Courses'].unique())
print(df.Courses.unique())

# Convert to List
print(df.Courses.unique().tolist())

# Unique values with drop_duplicates
df.Courses.drop_duplicates()
print(df)

# Using pandas.unique() to unique values in multiple columns
df2 = pd.unique(df[['Courses', 'Fee']].values.ravel('K'))
print(df2)

# Using pandas.unique() to unique values
df2 = pd.unique(df[['Courses']].values.ravel())
print(df2)

# Find the unique values in multiple columns using numpy.unique()
df2 = np.unique(df[['Courses', 'Duration']].values)
print(df2)

# Use numpy.unique() to unique values in multiple columns 
column_values = df[['Courses', 'Duration']].values
df2 = np.unique(column_values)
print(df2)

# Using Set() in pandas DataFrame
df2 = set(df.Courses.append(df.Fee).values)
print(df2)

# Using set() method
df2 = set(df.Courses) | set(df.Fee)
print(df2)

# To get unique values in one series/column
df2 = df['Courses'].unique()
print(df2)

# Using pandas.concat to extend one column to multiple columns
df2 = pd.concat([df['Courses'],df['Duration'],df['Fee']]).unique()
print(df2)

# Use Series.drop_duplicates() to get unique values
print(df.Courses.drop_duplicates())

Frequently Asked Questions of Get Unique Values From DataFrame

How do I get unique values from a single column in a DataFrame?

To get unique values from a single column/multiple columns in a DataFrame, you can use the .unique() method.

How can I get unique values from multiple columns in a DataFrame?

To get unique values from multiple columns, you can use the .unique() method on each column individually or concatenate unique values from different columns.

How can I handle missing values while getting unique values from a DataFrame?

Missing values (NaN) are considered unique values. You may want to handle them separately by using methods like .dropna() removing missing values before obtaining unique values if necessary.

How can I get unique values from all columns in a DataFrame?

To get unique values from all columns, you can use a combination of techniques, such as iterating through the columns and using the .unique() method on each column.

How can I get a list of unique values and their counts in a DataFrame column?

You can use the value_counts() method to get a list of unique values and their respective counts in a DataFrame column. For example, value_counts = df[‘column_name’].value_counts()

Conclusion

In this article, you have learned how to get unique values from single column/multiple columns in DataFrame using unique(), concat(), Series.unique(), and Numpy.unique() functions with examples.

Happy Learning !!

References

Malli

Malli is an experienced technical writer with a passion for translating complex Python concepts into clear, concise, and user-friendly articles. Over the years, he has written hundreds of articles in Pandas, NumPy, Python, and takes pride in ability to bridge the gap between technical experts and end-users.

Leave a Reply