You can get unique values in column (multiple columns) from pandas DataFrame using unique()
or Series.unique()
functions. unique()
from Series is used to get unique values from a single column and the other one is used to get from multiple columns.
Unique removes all duplicate values on a column and returns a single value for multiple same values.
Note that Uniques are returned in order of appearance. if you wanted to sort, use sort()
function to sort single or multiple columns of DataFrame.
Related: Find Duplicate Rows from pandas DataFrame
1. Quick Examples of Get Unique Values in Columns
If you are in a hurry, below are some quick examples of how to get unique values in a single column and multiple columns in DataFrame.
# Below are some quick examples
# Find unique values of a column
print(df['Courses'].unique())
print(df.Courses.unique())
# Convert to List
print(df.Courses.unique().tolist())
# Unique values with drop_duplicates
df.Courses.drop_duplicates()
# Using pandas.unique() to unique values in multiple columns
df2 = pd.unique(df[['Courses', 'Fee']].values.ravel('K'))
# Using pandas.unique() to unique values
df2 = pd.unique(df[['Courses']].values.ravel())
# Find the unique values in multiple columns using numpy.unique()
df2 = np.unique(df[['Courses', 'Duration']].values)
# Use numpy.unique() to unique values in multiple columns
column_values = df[['Courses', 'Duration']].values
df2 = np.unique(column_values)
# Using Set() in pandas DataFrame
df2 = set(df.Courses.append(df.Fee).values)
# Using set() method
df2 = set(df.Courses) | set(df.Fee)
# To get unique values in one series/column
df2 = df['Courses'].unique()
# Using pandas.concat to extend one column to multiple columns
df2 = pd.concat([df['Courses'],df['Duration'],df['Fee']]).unique()
# Use Series.drop_duplicates() to get unique values
print(df.Courses.drop_duplicates())
Now, let’s create a DataFrame with duplicate values, execute these examples and validate results. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
# Create DataFrame
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Python","pandas","Python","Spark","pandas"],
'Fee' :[20000,25000,22000,30000,22000,20000,30000],
'Duration':['30days','40days','35days','50days','40days','30days','50days'],
'Discount':[1000,2300,1200,2000,2300,1000,2000]
}
df = pd.DataFrame(technologies)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 20000 30days 1000
1 PySpark 25000 40days 2300
2 Python 22000 35days 1200
3 pandas 30000 50days 2000
4 Python 22000 40days 2300
5 Spark 20000 30days 1000
6 pandas 30000 50days 2000
2. pandas Get Unique Values in Column
Unique is also referred to as distinct, you can get unique values in the column using pandas Series.unique()
function, since this function needs to call on the Series object, use df['column_name']
to get the unique values as a Series.
Syntax:
# Syntax of unique()
Series.unique(values)
Let’s see an example.
# Find unique values of a column
print(df['Courses'].unique())
# Output:
# ['Spark' 'PySpark' 'Python' 'pandas']
Yields Series object as output. This eliminates all duplicates and returns only unique values from the Courses
column.
3. Find Unique Values in Multiple Columns
In case you wanted to get unique values on multiple columns of DataFrame use pandas.unique()
function, using this you can also get unique values of a single column.
Syntax:
# Syntax
pandas.unique(values)
Let’s see an example. Since unique() function takes values, you need to get the value of a column using df[columns_list].values.ravel()
.
# Using pandas.unique() to unique values in multiple columns
df2 = pd.unique(df[['Courses', 'Fee']].values.ravel())
print(df2)
# Output:
# ['Spark' 20000 'PySpark' 25000 'Python' 22000 'pandas' 30000]
If you wanted to get all unique values for one column and then the second column use argument ‘K
‘ to the ravel()
function. The argument 'K'
tells the method to flatten the array in the order of the elements. This can be significantly faster than using the method’s default ‘C
‘ order.
# Using pandas.unique() to unique values in multiple columns
df2 = pd.unique(df[['Courses', 'Fee']].values.ravel('k'))
print(df2)
# Output:
# ['Spark' 'PySpark' 'Python' 'pandas' 20000 25000 22000 30000]
To get unique values of a single column.
# Using pandas.unique() to unique values
df2 = pd.unique(df[['Courses']].values.ravel())
# Output:
# ['Spark' 'PySpark' 'Python' 'pandas']
4. Using Numpy.unique()
If you are using Numpy, use unique()
method to eliminate duplicate values.
import numpy as np
# Find the unique values in multiple columns using numpy.unique()
df2 = np.unique(df[['Courses', 'Duration']].values)
print(df2)
# Use numpy.unique() to unique values in multiple columns
column_values = df[['Courses', 'Duration']].values
df2 = np.unique(column_values)
print(df2)
# Output:
# ['30days' '35days' '40days' '50days' 'PySpark' 'Python' 'Spark' 'pandas']
5. Using set() to Eliminate Duplicates
The set()
function also removes all duplicate values and gets only unique values. We can use this set()
function to get unique values from DataFrame single or multiple columns.
# Using Set() in pandas DataFrame
df2 = set(df.Courses.append(df.Fee).values)
print(df2)
# Using set() method
df2 = set(df.Courses) | set(df.Fee)
print(df2)
# Output:
# {20000, 25000, 'pandas', 30000, 22000, 'PySpark', 'Python', 'Spark'}
6. Using pandas.concat() and Unique() Methods
Using unique()
and pandas.concat() combination to get unique values of multiple columns.
# Using pandas.concat to extend one column to multiple columns
df2 = pd.concat([df['Courses'],df['Duration'],df['Fee']]).unique()
print(f"Unique Values from three Columns: {df2}")
Yields below output.
# Output:
Unique Values from three Columns: ['Spark' 'PySpark' 'Python' 'pandas' '30days' '40days' '35days' '50days'
20000 25000 22000 30000]
7. Use Series.drop_duplicates()
Finally, you can get the unique values of a column using drop_duplicates() function of Series object. After dropping duplicates, it returns a Series object with unique values.
# Use Series.drop_duplicates() to get unique values
print(df.Courses.drop_duplicates())
# Output:
# 0 Spark
# 1 PySpark
# 2 Python
# 3 pandas
# Name: Courses, dtype: object
8. Complete Example of pandas Get Unique Values in Columns
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Python","pandas","Python","Spark","pandas"],
'Fee' :[20000,25000,22000,30000,22000,20000,30000],
'Duration':['30days','40days','35days','50days','40days','30days','50days'],
'Discount':[1000,2300,1200,2000,2300,1000,2000]
}
df = pd.DataFrame(technologies)
print(df)
# Find unique values of a column
print(df['Courses'].unique())
print(df.Courses.unique())
# Convert to List
print(df.Courses.unique().tolist())
# Unique values with drop_duplicates
df.Courses.drop_duplicates()
print(df)
# Using pandas.unique() to unique values in multiple columns
df2 = pd.unique(df[['Courses', 'Fee']].values.ravel('K'))
print(df2)
# Using pandas.unique() to unique values
df2 = pd.unique(df[['Courses']].values.ravel())
print(df2)
# Find the unique values in multiple columns using numpy.unique()
df2 = np.unique(df[['Courses', 'Duration']].values)
print(df2)
# Use numpy.unique() to unique values in multiple columns
column_values = df[['Courses', 'Duration']].values
df2 = np.unique(column_values)
print(df2)
# Using Set() in pandas DataFrame
df2 = set(df.Courses.append(df.Fee).values)
print(df2)
# Using set() method
df2 = set(df.Courses) | set(df.Fee)
print(df2)
# To get unique values in one series/column
df2 = df['Courses'].unique()
print(df2)
# Using pandas.concat to extend one column to multiple columns
df2 = pd.concat([df['Courses'],df['Duration'],df['Fee']]).unique()
print(df2)
# Use Series.drop_duplicates() to get unique values
print(df.Courses.drop_duplicates())
Conclusion
In this article, you have learned how to get unique values from single column and multiple columns in DataFrame using unique()
,concat()
,Series.unique()
and Numpy.unique()
functions with examples.
Happy Learning !!
Related Articles
- Select Rows From List of Values in Pandas DataFrame
- Pandas Get List of All Duplicate Rows
- Delete Last Row From Pandas DataFrame
- Change the Order of Pandas DataFrame Columns
- Append a List as a Row to Pandas DataFrame
- Get First Row of Pandas DataFrame?
- Pandas Get Last Row from DataFrame?
- Pandas Get Row Number of DataFrame
- Get First N Rows of Pandas DataFrame