• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:12 mins read
You are currently viewing Pandas Convert Column to Numpy Array

We can convert the Pandas DataFrame column to a Numpy array by using to_numpy() and values() functions. Using the to_numpy() function we can convert the whole DataFrame to a NumPy array. Pandas provide various functions to manipulate or analyze our data. Using some of these functions we can easily convert one data structure to another data structure.

In this article, I will explain how to convert the DataFrame column to a Numpy array using various functions and attributes with examples.

1. Quick Examples to Convert DataFrame Column to Numpy Array

If you are in a hurry, below are some quick examples of how to convert the DataFrame column to a NumPy array.


# Below are the quick examples

# Example 1: Convert specific column use to_numpy()
array = df['Courses'].to_numpy()

# Example 2: Convert all columns to numpy array
array = df.to_numpy()

# Example 3: Convert df column to array using df.Values 
array = df['Fee'].values

# Example 4: Convert Pandas column to array use slicing 
array = df[df.columns[3:]].to_numpy()

# Example 5: Convert column to NumPy array use iloc[]
array = df.iloc[:,-1:].values

# Example 6: Convert column name to array
array = (df.columns.to_numpy())

Now, let’s create a pandas DataFrame with a few rows and columns, execute these examples, and validate the results. Our DataFrame contains column names CoursesFeeDuration, and Discount.


# Create Pandas DataFrame
import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

Yields below output.

pandas convert column array
Pandas DataFrame

2. Convert Pandas DataFrame Column to NumPy Array

We can convert the pandas DataFrame column to a NumPy array by using the to_numpy() function. Let’s see how to convert specific (single or multiple) columns from DataFrame to the NumPy array, first, select the specified column from DataFrame by using bracket notation [] then, call the to_numpy() function. It will convert a specified column of Pandas DataFrame to a NumPy array.


# Convert specific column use to_numpy()
array = df['Courses'].to_numpy()
print(array)

# Convert specific columns 
array = df[['Courses', 'Duration']].to_numpy()
print(array)

Yield below output.

pandas convert column to array
NumPy array
pandas convert column to array
NumPy array

Moreover, using to_numpy() function we can convert the whole pandas DataFrame to a NumPy array. It returns two-dimensional NumPy array.


# Convert all columns to numpy array
array = df.to_numpy()
print(array)

# Output:
# [['Spark' 20000 '30days' 1000]
# ['PySpark' 25000 '40days' 2300]
# ['Python' 22000 '35days' 1200]
# ['pandas' 30000 '50days' 2000]]

3. Convert Pandas Column to Array using Values()

In this section, we’ll convert the pandas DataFrame column into a NumPy array using df['col_name'].values(). The values() function returns the NumPy array representation of the DataFrame. As a result, the row and column axes (labels) are not present. For example,


# Convert df column to array using df.Values 
array = df['Fee'].values
print(array)

# Output:
# [20000 25000 22000 30000]

4. Use Pandas Slicing with to_numpy() & Convert Array

As we know that using Pandas slicing we can select a particular portion of rows or columns of a given DataFrame. Here, I will select a specific portion of the column that we want to convert into a NumPy array and call the to_numpy() function. It will convert a specified column or portion of the column into a NumPy array.


# Convert Pandas column to array use slicing 
array = df[df.columns[3:]].to_numpy()
print(array)

# Output:
# [[1000]
# [2300]
# [1200]
# [2000]] 

5. Use Pandas iloc[] Attribute to Convert Array

Alternatively, using the Pandas iloc[] attribute we can select a specified column and then call values. This syntax will convert a specified column of DataFrame into a NumPy array.


# Convert column to NumPy array use iloc[]
array = df.iloc[:,-1:].values
print(array)
print(type(array))

# Output:
# [[1000]
#  [2300]
#  [1200]
# [2000]] 

6.  Convert Column Names to NumPy Array

Using df.columns function along with the to_numpy() function we can convert column names of Pandas DataFrame into a NumPy array. Let’s apply the below syntax and convert it into a NumPy array.


# Convert column name to array
array = (df.columns.to_numpy())
print(array)
print(type(array))

# Output:
# ['Courses' 'Fee' 'Duration' 'Discount']  

Frequently Asked Questions on Convert Pandas Column to NumPy Array

How do I convert a Pandas DataFrame column to a NumPy array?

You can use the values attribute of a Pandas Series to convert the DataFrame column to a NumPy array.

How can I convert multiple columns to NumPy arrays simultaneously?

You can convert multiple columns to NumPy arrays using the values attribute of a Pandas Series by specifying the column names within a list. For example, array = df[['Column1', 'Column3']].values

How can I convert the entire DataFrame to a NumPy array?

you can convert the entire DataFrame to a NumPy array using the values attribute. For example, array = df.values

How can I handle missing values when converting to NumPy arrays?

NumPy arrays don’t explicitly handle missing values. If your DataFrame contains NaN values, they will be present in the NumPy array. You may want to handle missing values separately using methods like fillna or dropna before converting to a NumPy array.

7. Conclusion

In this article, we have learned how to convert Pandas DataFrame column to NumPy array by using to_numpy() and values() functions and different attributes. Also, learned how to convert column names into an array.

Related Articles

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium