You can convert pandas DataFrame to Numpy array by using to_numpy()
, to_records()
, index()
, and values()
methods. I will explain how to convert DataFrame (all or selected multiple columns) to Numpy array with examples in this article.
1. Quick Examples to Convert DataFrame to Numpy Array
If you are in a hurry, below are some quick examples of how to convert DataFrame to NumPy array.
# Below are quick examples
# Using df.to_numpy() method.
result = df.to_numpy()
# Convert specific column to numpy array.
df2=df['Courses'].to_numpy()
#Convert specific columns using df.to_numpy() method.
df2 = df[['Courses', 'Duration']].to_numpy()
# Using DataFrame.to_records()
print(df.to_records())
# Convert Pandas DataFrame to numpy array by df.Values() method.
values_array = df.values
print(values_array)
# Convert row Index method.
df.index.to_numpy()
Now, let’s create a pandas DataFrame with a few rows and columns, execute these examples and validate the results. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
Yields below output.
Courses Fee Duration Discount
r1 Spark 20000 30days 1000
r2 PySpark 25000 40days 2300
r3 Python 22000 35days 1200
r4 pandas 30000 50days 2000
2. Convert Pandas DataFrame to NumPy Array
You can convert pandas DataFrame to NumPy array by using to_numpy()
method. This method is called on the DataFrame object and returns an object of type Numpy ndarray and it accepts three optional parameters.
dtype
– To specify the datatype of the values in the array.copy
–copy=True
makes a new copy of the array andcopy=False
returns just a view of another array.False
is default and it’ll return just a view of another array, if it exists.na_value
– To specify a value to be used for any missing value in the array. You can pass any value here.
For Example-
# Using df.to_numpy() method to concert all columsn to numpy array
result = df.to_numpy()
print(result)
# Output
#[['Spark' 20000 '30days' 1000]
# ['PySpark' 25000 '40days' 2300]
# ['Python' 22000 '35days' 1200]
# ['pandas' 30000 '50days' 2000]]
Let’s see how to convert specific (single or multiple) columns from DataFrame to NumPy array, first select the columns from DataFrame by using bracket notation [] and on the result use to_numpy() function.
# Convert specific rows using to_numpy() method.
df2=df['Courses'].to_numpy()
print(df2)
# Outputs
# ['Spark' 'PySpark' 'Python' 'pandas']
# Convert specific columns using df.to_numpy() method.
result = df[['Courses', 'Duration']].to_numpy()
print(result)
# Output
#[['Spark' '30days']
# ['PySpark' '40days']
# ['Python' '35days']
# ['pandas' '50days']]
3. Using DataFrame.Values() Method
In this section, you’ll convert the pandas DataFrame into a NumPy array using df.values()
. The values method returns the NumPy array representation of the DataFrame. As a result, the row and columns axis (labels) are not present.
# Convert Pandas DataFrame to numpy array by df.Values() method.
values_array = df.values
print(values_array)
Yields below output.
[['Spark' 20000 '30days' 1000]
['PySpark' 25000 '40days' 2300]
['Python' 22000 '35days' 1200]
['pandas' 30000 '50days' 2000]]
4. Convert DataFrame to NumPy Array using to_records()
In order to get the rows axis on the NumPy array from DataFrame use DataFrame.to_records()
method.
# Using DataFrame.to_records()
print(df.to_records())
Yields below output.
[('r1', 'Spark', 20000, '30days', 1000)
('r2', 'PySpark', 25000, '40days', 2300)
('r3', 'Python', 22000, '35days', 1200)
('r4', 'pandas', 30000, '50days', 2000)]
5. Using Index.to_numpy() to Convert Row Indices to NumPy
Use Index.to_numpy()
method to convert DataFrame row labels to NumPy array.
# Using DataFrame.index.to_numpy() method.
df.index.to_numpy()
Yields below output.
array(['r1', 'r2', 'r3', 'r4'], dtype=object)
6. Complete Example
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
# Using df.to_numpy() method.
print(df.to_numpy())
#Convert specific columns using df.to_numpy() method.
df[['Courses', 'Duration']].to_numpy()
# Using DataFrame.index method.
df.index.to_numpy()
# Convert specific rows using to_numpy() method.
df2=df['Courses'].to_numpy()
print(df2)
# Using DataFrame.to_records()
print(df.to_records())
# Convert Pandas DataFrame to numpy array by df.Values() method.
values_array = df.values
print(values_array)
# Convert select Columns into Numpy array.
Fee_array=df[['Fee']].to_numpy()
print(Fee_array)
Conclusion
In this article, you have learned how to convert Pandas DataFrame to NumPy array by using to_numpy()
, to_records()
, index()
, and values()
methods. To convert the selected columns, first select the columns from DataFrame by using bracket notation [] and on the result use to_numpy() function. Also, learned how to get a row index into the array.
Related Articles
- Pandas Create Empty DataFrame
- Pandas Empty DataFrame with Specific Column Types
- Rename Specific Columns in Pandas
- Get Column Average or Mean in Pandas DataFrame
- Create Pandas DataFrame With Working Examples
- How to Convert NumPy Array to Pandas Series?
- Convert Pandas Series to NumPy Array
- Convert NumPy Array to Pandas DataFrame
- How to Convert Pandas DataFrame to List?