We can convert the Pandas DataFrame column to a Numpy array by using to_numpy()
and values()
functions. Using the to_numpy()
function we can convert the whole DataFrame to a NumPy array. Pandas provide various functions to manipulate or analyze our data. Using some of these functions we can easily convert one data structure to another data structure.
In this article, I will explain convert the DataFrame column to a Numpy array using various functions and attributes with examples.
Quick Examples to Convert Column to Array
If you are in a hurry, below are some quick examples of how to convert the DataFrame column to a NumPy array.
# Quick examples to convert column to array
# Example 1: Convert specific column
# Use to_numpy()
array = df['Courses'].to_numpy()
# Example 2: Convert all columns
# To numpy array
array = df.to_numpy()
# Example 3: Convert df column to array
# Using df.Values
array = df['Fee'].values
# Example 4: Convert Pandas column
# To array use slicing
array = df[df.columns[3:]].to_numpy()
# Example 5: Convert column
# To NumPy array use iloc[]
array = df.iloc[:,-1:].values
# Example 6: Convert column name to array
array = (df.columns.to_numpy())
Now, let’s create a pandas DataFrame with a few rows and columns, execute these examples, and validate the results. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
# Create Pandas DataFrame
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
Yields below output.
Convert Pandas DataFrame Column to NumPy Array
To transform a Pandas DataFrame column into a NumPy array, we can use the to_numpy()
function. To convert one or more specific columns from the DataFrame to a NumPy array, first, select the desired column(s) using bracket notation []
, then call the to_numpy()
function. It will convert a specified column of Pandas DataFrame to a NumPy array.
# Convert specific column use to_numpy()
array = df['Courses'].to_numpy()
print(array)
# Convert specific columns
array = df[['Courses', 'Duration']].to_numpy()
print(array)
Yield below output.
Moreover, using to_numpy()
function we can convert the whole pandas DataFrame to a NumPy array. It returns a two-dimensional NumPy array.
# Convert all columns to numpy array
array = df.to_numpy()
print(array)
# Output:
# [['Spark' 20000 '30days' 1000]
# ['PySpark' 25000 '40days' 2300]
# ['Python' 22000 '35days' 1200]
# ['pandas' 30000 '50days' 2000]]
Convert Pandas Column to Array using Values()
In this section, we’ll convert the pandas DataFrame column into a NumPy array using df['col_name'].values()
. The values() function returns the NumPy array representation of the DataFrame. As a result, the row and column axes (labels) are not present. For example,
# Convert df column to array using df.Values
array = df['Fee'].values
print(array)
# Output:
# [20000 25000 22000 30000]
Use Pandas Slicing with to_numpy() & Convert Array
As we know that using Pandas slicing we can select a particular portion of rows or columns of a given DataFrame. Here, I will select a specific portion of the column that we want to convert into a NumPy array and call the to_numpy()
function. It will convert a specified column or portion of the column into a NumPy array.
# Convert Pandas column to array use slicing
array = df[df.columns[3:]].to_numpy()
print(array)
# Output:
# [[1000]
# [2300]
# [1200]
# [2000]]
Use Pandas iloc[] Attribute to Convert Array
Alternatively, using the Pandas iloc[] attribute we can select a specified column and then call values. This syntax will convert a specified column of DataFrame into a NumPy array.
# Convert column to NumPy array use iloc[]
array = df.iloc[:,-1:].values
print(array)
print(type(array))
# Output:
# [[1000]
# [2300]
# [1200]
# [2000]]
Convert Column Names to NumPy Array
Using df.columns
function along with the to_numpy()
function we can convert column names of Pandas DataFrame into a NumPy array. Let’s apply the below syntax and convert it into a NumPy array.
# Convert column name to array
array = (df.columns.to_numpy())
print(array)
print(type(array))
# Output:
# ['Courses' 'Fee' 'Duration' 'Discount']
FAQ on Convert Pandas Column to NumPy Array
You can use the values
attribute of a Pandas Series to convert the DataFrame column to a NumPy array.
You can convert multiple columns to NumPy arrays using the values
attribute of a Pandas Series by specifying the column names within a list. For example, array = df[['Column1', 'Column3']].values
you can convert the entire DataFrame to a NumPy array using the values
attribute. For example, array = df.values
NumPy arrays don’t explicitly handle missing values. If your DataFrame contains NaN values, they will be present in the NumPy array. You may want to handle missing values separately using methods like fillna
or dropna
before converting to a NumPy array.
Conclusion
In this article, you have learned how to convert Pandas DataFrame column to NumPy array by using to_numpy()
and values()
functions and different attributes. Also, learned how to convert column names into an array.
Related Articles
- Convert NumPy Array to Pandas DataFrame
- How to Convert Pandas DataFrame to List?
- Convert Pandas Series to String
- Convert Pandas Index to List
- Pandas Convert String to Integer
- How to Convert Pandas DataFrame to List?
- Pandas Convert List of Dictionaries to DataFrame
- Pandas – Convert DataFrame to Dictionary (Dict)
- Convert PySpark DataFrame to Pandas
- convert pandas to pyspark dataframe