Convert Pandas DataFrame to NumPy Array

You can convert pandas DataFrame to Numpy array by using to_numpy(), to_records(), index(), and values() methods. In this article, I will explain how to convert Pandas DataFrame columns (all or selected multiple columns) to Numpy array with examples.

1. Quick Examples to Convert DataFrame to Numpy Array

If you are in a hurry, below are some quick examples of how to convert pandas DataFrame to numpy array.


# Below are quick examples
# Using df.to_numpy() method.
result = df.to_numpy()

# Convert specific column to numpy array.
df2=df['Courses'].to_numpy()

#Convert specific columns using df.to_numpy() method.
df2 = df[['Courses', 'Duration']].to_numpy()

# Using DataFrame.to_records()
print(df.to_records())

# Convert Pandas DataFrame to numpy array by df.Values() method.
values_array = df.values
print(values_array)

# Convert row Index method.
df.index.to_numpy()

Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names CoursesFeeDuration, and Discount.


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

Yields below output.


    Courses    Fee Duration  Discount
r1    Spark  20000   30days      1000
r2  PySpark  25000   40days      2300
r3   Python  22000   35days      1200
r4   pandas  30000   50days      2000

2. Using DataFrame.to_numpy() Method

You can convert DataFrame into numpy array by using to_numpy() method. This returns object of type Numpy ndarray and It accepts three optional parameters.

  • dtype – To specify the datatype of the values in the array.
  • copy – copy=True makes a new copy of the array and copy=False returns just a view of another array. False is default and it’ll return just a view of another array, if it exists.
  • na_value – To specify a value to be used for any missing value in the array. You can pass any value here.

For Example-


# Using df.to_numpy() method to concert all columsn to numpy array
result = df.to_numpy()
print(result)

# Output
#[['Spark' 20000 '30days' 1000]
# ['PySpark' 25000 '40days' 2300]
# ['Python' 22000 '35days' 1200]
# ['pandas' 30000 '50days' 2000]]

Let’s see how to convert specific (single or multiple) columns from DataFrame to NumPy array.


# Convert specific rows using to_numpy() method.
df2=df['Courses'].to_numpy()
print(df2)
# Outputs
# ['Spark' 'PySpark' 'Python' 'pandas']

# Convert specific columns using df.to_numpy() method.
result = df[['Courses', 'Duration']].to_numpy()
print(result)

# Output
#[['Spark' '30days']
# ['PySpark' '40days']
# ['Python' '35days']
# ['pandas' '50days']]

3. Using DataFrame.Values() Method

In this section, you’ll convert the DataFrame into a NumPy array using df.values(). The values method returns the NumPy array representation of the DataFrame. On a result, notice that the row and columns axis (labels) are not present-


# Convert Pandas DataFrame to numpy array by df.Values() method.
values_array = df.values
print(values_array)

 Yields below output.


[['Spark' 20000 '30days' 1000]
 ['PySpark' 25000 '40days' 2300]
 ['Python' 22000 '35days' 1200]
 ['pandas' 30000 '50days' 2000]]

4. Convert DataFrame to numpy Array using to_records()

In order to get the rows axis on numpy array from DataFrame use DataFrame.to_records() method.


# Using DataFrame.to_records()
print(df.to_records())

Yields below output.


[('r1', 'Spark', 20000, '30days', 1000)
 ('r2', 'PySpark', 25000, '40days', 2300)
 ('r3', 'Python', 22000, '35days', 1200)
 ('r4', 'pandas', 30000, '50days', 2000)]

5. Using Index.to_numpy() to Convert Row Indices to NumPy

Use Index.to_numpy() method to convert DataFrame row labels to NumPy array.


# Using DataFrame.index.to_numpy() method.
df.index.to_numpy()

 Yields below output.


array(['r1', 'r2', 'r3', 'r4'], dtype=object)

6. Complete Example For Convert Pandas DataFrame to numpy Array


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)


# Using df.to_numpy() method.
print(df.to_numpy())

#Convert specific columns using df.to_numpy() method.
df[['Courses', 'Duration']].to_numpy()

# Using DataFrame.index method.
df.index.to_numpy()

# Convert specific rows using to_numpy() method.
df2=df['Courses'].to_numpy()
print(df2)

# Using DataFrame.to_records()
print(df.to_records())

# Convert Pandas DataFrame to numpy array by df.Values() method.
values_array = df.values
print(values_array)

# Convert select Columns into Numpy array.
Fee_array=df[['Fee']].to_numpy()
print(Fee_array)

Conclusion

In this article, you have learned how to convert Pandas DataFrame to numpy array by using to_numpy(), to_records(), index(), and values() methods.

You May Also Like

References

Leave a Reply

Convert Pandas DataFrame to NumPy Array