Creatingdataframes

3 minute read

What are different ways in which you can create a dataframe ?

  • Using pd.DataFrame() constructor.
  • Zip lists to build a DataFrame.
  • Building dataframes with broadcasting.

Using pd.DataFrame() constructor

We can create a dataframe by using the pd.DataFrame() constructor. The dictionary-list constructor assigns values to the column labels, but just uses an ascending count from 0 (0, 1, 2, 3, …) for the row labels.

# Pre-defined lists
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]

# Import pandas as pd
import pandas as pd

# Create dictionary my_dict with three key:value pairs: my_dict
my_dict = { 'country':names, 'drives_right':dr, 'cars_per_cap':cpc }

# Build a DataFrame cars from my_dict: cars
cars = pd.DataFrame(my_dict)

# Print cars
print(cars)
   cars_per_cap        country  drives_right
0           809  United States          True
1           731      Australia         False
2           588          Japan         False
3            18          India         False
4           200         Russia          True
5            70        Morocco          True
6            45          Egypt          True

Another simple example is shown below. Here we are using the same way as above - i.e, using a dictionary of lists to create a dataframe. In addition, we also show below how we can modify the index of the rows after creating the dataframe.

# Your code here
Apples = [35, 41]
Bananas = [21, 34]
rows = ['2017 Sales', '2018 Sales']

df = pd.DataFrame({'Apples': Apples, 'Bananas': Bananas})
df.index = rows
df.head()
Apples Bananas
2017 Sales 35 21
2018 Sales 41 34

Instead of creating the index after creating the dataframe, there is another way to create the index while creating the dataframe itself.

# Your code here
Apples = [35, 41]
Bananas = [21, 34]
rows = ['2017 Sales', '2018 Sales']

df = pd.DataFrame({'Apples': Apples, 'Bananas': Bananas}, index=rows)
df.head()
Apples Bananas
2017 Sales 35 21
2018 Sales 41 34

Using Zip Lists to build DataFrames

Suppose you have list_keys and list_values as two seperate lists, and you want to use the list_keys as the column names and list_values as the column data, then you can use the zip and list functions to create the desired dataframes.

list_keys = ['Country', 'GDP']
list_values = [['US', 'AUS', 'IND'], [400, 300, 100]]
zipped_lists = zip(list_keys, list_values)

# Zip object - a generator
print(zipped_lists)

# convert to a list of tuples
zipped_tuples = list(zipped_lists)

# list of tuples 
print(zipped_tuples)
<zip object at 0x11493ae08>
[('Country', ['US', 'AUS', 'IND']), ('GDP', [400, 300, 100])]

Using the above list of tuples, we can convert them into a dictionary. Remember, pd.DataFrame() takes a dict object.

# convert to a dictionary
dict_lists = dict(zipped_tuples)

print(dict_lists)

{'Country': ['US', 'AUS', 'IND'], 'GDP': [400, 300, 100]}
# Finally create a dataframe.
df1 = pd.DataFrame(dict_lists)

# print
df1.head()
Country GDP
0 US 400
1 AUS 300
2 IND 100

Using broadcasting

You can implicitly use ‘broadcasting’, a feature of NumPy, when creating pandas DataFrames. For example, here we create a DataFrame of cities in Pennsylvania that contains the city name in one column and the state name in the second.

cities = ['Manheim',
 'Preston park',
 'Biglerville',
 'Indiana',
 'Curwensville',
 'Crown',
 'Harveys lake',
 'Mineral springs',
 'Cassville',
 'Hannastown',
 'Saltsburg',
 'Tunkhannock',
 'Pittsburgh',
 'Lemasters',
 'Great bend']

# Make a string with the value 'PA': state
state = 'PA'

# Construct a dictionary: data
data = {'state':state, 'city':cities}

# Construct a DataFrame from dictionary data: df
df = pd.DataFrame(data)

# Print the DataFrame
print(df)
               city state
0           Manheim    PA
1      Preston park    PA
2       Biglerville    PA
3           Indiana    PA
4      Curwensville    PA
5             Crown    PA
6      Harveys lake    PA
7   Mineral springs    PA
8         Cassville    PA
9        Hannastown    PA
10        Saltsburg    PA
11      Tunkhannock    PA
12       Pittsburgh    PA
13        Lemasters    PA
14       Great bend    PA

Updated: