• Post author:
  • Post category:Pandas
  • Post last modified:May 27, 2024
  • Reading time:12 mins read
You are currently viewing Pandas Convert Column to Numpy Array

We can convert the Pandas DataFrame column to a Numpy array by using to_numpy() and values() functions. Using the to_numpy() function we can convert the whole DataFrame to a NumPy array. Pandas provide various functions to manipulate or analyze our data. Using some of these functions we can easily convert one data structure to another data structure.

Advertisements

In this article, I will explain convert the DataFrame column to a Numpy array using various functions and attributes with examples.

Quick Examples to Convert Column to Array

If you are in a hurry, below are some quick examples of how to convert the DataFrame column to a NumPy array.


# Quick examples to convert column to array

# Example 1: Convert specific column 
# Use to_numpy()
array = df['Courses'].to_numpy()

# Example 2: Convert all columns 
# To numpy array
array = df.to_numpy()

# Example 3: Convert df column to array 
# Using df.Values 
array = df['Fee'].values

# Example 4: Convert Pandas column 
# To array use slicing 
array = df[df.columns[3:]].to_numpy()

# Example 5: Convert column 
# To NumPy array use iloc[]
array = df.iloc[:,-1:].values

# Example 6: Convert column name to array
array = (df.columns.to_numpy())

Now, let’s create a pandas DataFrame with a few rows and columns, execute these examples, and validate the results. Our DataFrame contains column names CoursesFeeDuration, and Discount.


# Create Pandas DataFrame
import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

Yields below output.

pandas convert column array
Pandas DataFrame

Convert Pandas DataFrame Column to NumPy Array

To transform a Pandas DataFrame column into a NumPy array, we can use the to_numpy() function. To convert one or more specific columns from the DataFrame to a NumPy array, first, select the desired column(s) using bracket notation [], then call the to_numpy() function. It will convert a specified column of Pandas DataFrame to a NumPy array.


# Convert specific column use to_numpy()
array = df['Courses'].to_numpy()
print(array)

# Convert specific columns 
array = df[['Courses', 'Duration']].to_numpy()
print(array)

Yield below output.

pandas convert column to array
NumPy array
pandas convert column to array
NumPy array

Moreover, using to_numpy() function we can convert the whole pandas DataFrame to a NumPy array. It returns a two-dimensional NumPy array.


# Convert all columns to numpy array
array = df.to_numpy()
print(array)

# Output:
# [['Spark' 20000 '30days' 1000]
# ['PySpark' 25000 '40days' 2300]
# ['Python' 22000 '35days' 1200]
# ['pandas' 30000 '50days' 2000]]

Convert Pandas Column to Array using Values()

In this section, we’ll convert the pandas DataFrame column into a NumPy array using df['col_name'].values(). The values() function returns the NumPy array representation of the DataFrame. As a result, the row and column axes (labels) are not present. For example,


# Convert df column to array using df.Values 
array = df['Fee'].values
print(array)

# Output:
# [20000 25000 22000 30000]

Use Pandas Slicing with to_numpy() & Convert Array

As we know that using Pandas slicing we can select a particular portion of rows or columns of a given DataFrame. Here, I will select a specific portion of the column that we want to convert into a NumPy array and call the to_numpy() function. It will convert a specified column or portion of the column into a NumPy array.


# Convert Pandas column to array use slicing 
array = df[df.columns[3:]].to_numpy()
print(array)

# Output:
# [[1000]
# [2300]
# [1200]
# [2000]] 

Use Pandas iloc[] Attribute to Convert Array

Alternatively, using the Pandas iloc[] attribute we can select a specified column and then call values. This syntax will convert a specified column of DataFrame into a NumPy array.


# Convert column to NumPy array use iloc[]
array = df.iloc[:,-1:].values
print(array)
print(type(array))

# Output:
# [[1000]
#  [2300]
#  [1200]
# [2000]] 

Convert Column Names to NumPy Array

Using df.columns function along with the to_numpy() function we can convert column names of Pandas DataFrame into a NumPy array. Let’s apply the below syntax and convert it into a NumPy array.


# Convert column name to array
array = (df.columns.to_numpy())
print(array)
print(type(array))

# Output:
# ['Courses' 'Fee' 'Duration' 'Discount']  

FAQ on Convert Pandas Column to NumPy Array

How do I convert a Pandas DataFrame column to a NumPy array?

You can use the values attribute of a Pandas Series to convert the DataFrame column to a NumPy array.

How can I convert multiple columns to NumPy arrays simultaneously?

You can convert multiple columns to NumPy arrays using the values attribute of a Pandas Series by specifying the column names within a list. For example, array = df[['Column1', 'Column3']].values

How can I convert the entire DataFrame to a NumPy array?

you can convert the entire DataFrame to a NumPy array using the values attribute. For example, array = df.values

How can I handle missing values when converting to NumPy arrays?

NumPy arrays don’t explicitly handle missing values. If your DataFrame contains NaN values, they will be present in the NumPy array. You may want to handle missing values separately using methods like fillna or dropna before converting to a NumPy array.

Conclusion

In this article, you have learned how to convert Pandas DataFrame column to NumPy array by using to_numpy() and values() functions and different attributes. Also, learned how to convert column names into an array.

Related Articles