Pandas Get Unique Values in Column

You can get unique/distinct values in column (multiple columns) from pandas DataFrame using unique() or Series.unique() functions. unique() function from Series is used to get distinct values from a single column and the other one is used to get from multiple columns.

Unique removes all duplicate values on a column and returns a single value for multiple same values.

Note that Uniques are returned in order of appearance. if you wanted to sort, use sort() function to sort single or multiple columns of DataFrame.

Related: Find Duplicate Rows from pandas DataFrame

1. Quick Examples of Get Unique Values in Columns

If you are in a hurry, below are some quick examples of how to get unique values in a single column and multiple columns in DataFrame.


# Below are quick example

# Find unique values of a column
print(df['Courses'].unique())
print(df.Courses.unique())

# Convert to List
print(df.Courses.unique().tolist())

# unique values with drop_duplicates
df.Courses.drop_duplicates()

# Using pandas.unique() to unique values in multiple columns
df2 = pd.unique(df[['Courses', 'Fee']].values.ravel('K'))

# Using pandas.unique() to unique values
df2 = pd.unique(df[['Courses']].values.ravel())

# Find the unique values in multiple columns using numpy.unique()
df2 = np.unique(df[['Courses', 'Duration']].values)

# Use numpy.unique() to unique values in multiple columns 
column_values = df[['Courses', 'Duration']].values
df2 = np.unique(column_values)

# Using Set() in pandas DataFrame
df2 = set(df.Courses.append(df.Fee).values)

# Using set() method
df2 = set(df.Courses) | set(df.Fee)

# To get unique values in one series/column
df2 = df['Courses'].unique()

# Using pandas.concat to extend one column to multiple columns
df2 = pd.concat([df['Courses'],df['Duration'],df['Fee']]).unique()

# Use Series.drop_duplicates() to get unique values
print(df.Courses.drop_duplicates())

Now, let’s create a DataFrame with duplicate values, execute these examples and validate results. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","pandas","Python","Spark","pandas"],
    'Fee' :[20000,25000,22000,30000,22000,20000,30000],
    'Duration':['30days','40days','35days','50days','40days','30days','50days'],
    'Discount':[1000,2300,1200,2000,2300,1000,2000]
              }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


   Courses    Fee Duration  Discount
0    Spark  20000   30days      1000
1  PySpark  25000   40days      2300
2   Python  22000   35days      1200
3   pandas  30000   50days      2000
4   Python  22000   40days      2300
5    Spark  20000   30days      1000
6   pandas  30000   50days      2000

2. pandas Get Unique Values in Column

Unique is also referred to as distinct, you can get unique values in the column using pandas Series.unique() function, since this function needs to call on the Series object, use df['column_name'] to get the DataFrame column as a Series.

Syntax:


# Syntax
Series.unique(values)

Let’s see an example.


# Find unique values of a column
print(df['Courses'].unique())

# Output
# ['Spark' 'PySpark' 'Python' 'pandas']

Yields Series object as output. This eliminates all duplicates and returns only unique values from the Courses column.

3. Find Unique Values in Multiple Columns

In case you wanted to get unique values on multiple columns of DataFrame use pandas.unique() function, using this you can also get unique values of a single column.

Syntax:


# Syntax
pandas.unique(values)

Let’s see an example. Since unique() function takes values, you need to get the value of a column using df[columns_list].values.ravel().


# Using pandas.unique() to unique values in multiple columns
df2 = pd.unique(df[['Courses', 'Fee']].values.ravel())
print(df2)

# Outputs
# ['Spark' 20000 'PySpark' 25000 'Python' 22000 'pandas' 30000]

If you wanted to get all unique values for one column and then the second column use argument ‘K‘ to the ravel() function. The argument 'K' tells the method to flatten the array in the order of the elements. This can be significantly faster than using the method’s default ‘C‘ order.


# Using pandas.unique() to unique values in multiple columns
df2 = pd.unique(df[['Courses', 'Fee']].values.ravel('k'))
print(df2)

# Outputs
# ['Spark' 'PySpark' 'Python' 'pandas' 20000 25000 22000 30000]

To get unique values of a single column.


# Using pandas.unique() to unique values
df2 = pd.unique(df[['Courses']].values.ravel())

# Outputs
# ['Spark' 'PySpark' 'Python' 'pandas']

4. Using Numpy.unique()

If you are using Numpy, use unique() method to eliminate duplicate values.


import numpy as np
# Find the unique values in multiple columns using numpy.unique()
df2 = np.unique(df[['Courses', 'Duration']].values)
print(df2)

# Use numpy.unique() to unique values in multiple columns 
column_values = df[['Courses', 'Duration']].values
df2 = np.unique(column_values)
print(df2)

# Output
# ['30days' '35days' '40days' '50days' 'PySpark' 'Python' 'Spark' 'pandas']

5. Using set() to Eliminate Duplicates

The set() function also removes all duplicate values and gets only unique values. We can use this set() function to get unique values from DataFrame single or multiple columns.


# Using Set() in pandas DataFrame
df2 = set(df.Courses.append(df.Fee).values)
print(df2)

# Using set() method
df2 = set(df.Courses) | set(df.Fee)
print(df2)

# Outputs
# {20000, 25000, 'pandas', 30000, 22000, 'PySpark', 'Python', 'Spark'}

6. Using pandas.concat() and Unique() Methods

Using unique() and pandas.concat() combination to get unique values of multiple columns.


# Using pandas.concat to extend one column to multiple columns
df2 = pd.concat([df['Courses'],df['Duration'],df['Fee']]).unique()
print(f"Unique Values from three Columns: {df2}")

Yields below output.


Unique Values from three Columns: ['Spark' 'PySpark' 'Python' 'pandas' '30days' '40days' '35days' '50days'
20000 25000 22000 30000]

7. Use Series.drop_duplicates()

Finally, you can get the unique values of a column using drop_duplicates() function of Series object. After dropping duplicates, it returns a Series object with unique values.


# Use Series.drop_duplicates() to get unique values
print(df.Courses.drop_duplicates())
# Outputs
#0      Spark
#1    PySpark
#2     Python
#3     pandas
Name: Courses, dtype: object

8. Complete Example of pandas Get Unique Values in Columns


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","pandas","Python","Spark","pandas"],
    'Fee' :[20000,25000,22000,30000,22000,20000,30000],
    'Duration':['30days','40days','35days','50days','40days','30days','50days'],
    'Discount':[1000,2300,1200,2000,2300,1000,2000]
              }
df = pd.DataFrame(technologies)
print(df)

# Find unique values of a column
print(df['Courses'].unique())
print(df.Courses.unique())

# Convert to List
print(df.Courses.unique().tolist())

# unique values with drop_duplicates
df.Courses.drop_duplicates()
print(df)

# Using pandas.unique() to unique values in multiple columns
df2 = pd.unique(df[['Courses', 'Fee']].values.ravel('K'))
print(df2)

# Using pandas.unique() to unique values
df2 = pd.unique(df[['Courses']].values.ravel())
print(df2)

# Find the unique values in multiple columns using numpy.unique()
df2 = np.unique(df[['Courses', 'Duration']].values)
print(df2)

# Use numpy.unique() to unique values in multiple columns 
column_values = df[['Courses', 'Duration']].values
df2 = np.unique(column_values)
print(df2)

# Using Set() in pandas DataFrame
df2 = set(df.Courses.append(df.Fee).values)
print(df2)

# Using set() method
df2 = set(df.Courses) | set(df.Fee)
print(df2)

# To get unique values in one series/column
df2 = df['Courses'].unique()
print(df2)

# Using pandas.concat to extend one column to multiple columns
df2 = pd.concat([df['Courses'],df['Duration'],df['Fee']]).unique()
print(df2)

# Use Series.drop_duplicates() to get unique values
print(df.Courses.drop_duplicates())

Conclusion

In this article, you have learned how to get unique values from single column and multiple columns in DataFrame using unique(),concat(),Series.unique() and Numpy.unique() functions with examples.

Happy Learning !!

You May Also Like

References

Leave a Reply

Pandas Get Unique Values in Column