使用Excel文件创建数据框完整指南

2021年5月12日21:39:38 发表评论 946 次浏览

让我们来看看如何使用Excel将Excel文件读取到Pandas数据框对象大熊猫.

代码1:使用读取Excel文件read_excel()熊猫的方法。

# import pandas lib as pd
import pandas as pd
  
# read by default 1st sheet of an excel file
dataframe1 = pd.read_excel( 'SampleWork.xlsx' )
  
print (dataframe1)

输出:

Name  Age    Stream  Percentage
0      Ankit   18      Math          95
1      Rahul   19   Science          90
2    Shaurya   20  Commerce          85
3  Aishwarya   18      Math          80
4   Priyanka   19   Science          75

代码2:使用read_excel()方法的'sheet_name'读取特定表。

# import pandas lib as pd
import pandas as pd
  
# read 2nd sheet of an excel file
dataframe2 = pd.read_excel( 'SampleWork.xlsx' , sheet_name = 1 )
  
print (dataframe2)

输出:

Name  Age    Stream  Percentage
0     Priya   18      Math          95
1  shivangi   19   Science          90
2      Jeet   20  Commerce          85
3    Ananya   18      Math          80
4   Swapnil   19   Science          75

代码3:使用read_excel()方法的'usecols'参数读取特定列。

# import pandas lib as pd
import pandas as pd
  
require_cols = [ 0 , 3 ]
  
# only read specific columns from an excel file
required_df = pd.read_excel( 'SampleWork.xlsx' , usecols = require_cols)
  
print (required_df)

输出:

Name  Percentage
0      Ankit          95
1      Rahul          90
2    Shaurya          85
3  Aishwarya          80
4   Priyanka          75

代码4:使用read_excel()方法的'na_values'参数处理丢失的数据。

# import pandas lib as pd
import pandas as pd
  
# Handling missing values of 3rd sheet of an excel file.
dataframe = pd.read_excel( 'SampleWork.xlsx' , na_values = "Missing" , sheet_name = 2 )
  
print (dataframe)

输出:

Name  Age   Stream  Percentage
0     Priya   18     Math          95
1  shivangi   19  Science          90
2      Jeet   20      NaN          85
3    Ananya   18     Math          80
4   Swapnil   19  Science          75

代码5:使用read_excel()方法的'skiprows'参数读取Excel文件时,跳过开始行。

# import pandas lib as pd
import pandas as pd
  
# read 2nd sheet of an excel file after
# skipping starting two rows 
df = pd.read_excel( 'SampleWork.xlsx' , sheet_name = 1 , skiprows = 2 )
  
print (df)

输出:

shivangi  19   Science  90
0     Jeet  20  Commerce  85
1   Ananya  18      Math  80
2  Swapnil  19   Science  75

代码6:使用read_excel()方法的'header'参数,将标题设置为任意一行,并从该行开始读取。

# import pandas lib as pd
import pandas as pd
  
# setting the 3rd row as header.
df = pd.read_excel( 'SampleWork.xlsx' , sheet_name = 1 , header = 2 )
  
print (df)

输出:

shivangi  19   Science  90
0     Jeet  20  Commerce  85
1   Ananya  18      Math  80
2  Swapnil  19   Science  75

代码7:使用read_excel()方法的'sheet_name'参数读取多个Excel表。

# import pandas lib as pd
import pandas as pd
  
# read both 1st and 2nd sheet.
df = pd.read_excel( 'SampleWork.xlsx' , na_values = "Mssing" , sheet_name = [ 0 , 1 ])
  
print (df)

输出:

OrderedDict([(0, Name  Age    Stream  Percentage
0      Ankit   18      Math          95
1      Rahul   19   Science          90
2    Shaurya   20  Commerce          85
3  Aishwarya   18      Math          80
4   Priyanka   19   Science          75), (1, Name  Age    Stream  Percentage
0     Priya   18      Math          95
1  shivangi   19   Science          90
2      Jeet   20  Commerce          85
3    Ananya   18      Math          80
4   Swapnil   19   Science          75)])

代码8:使用read_excel()方法的'sheet_name'参数一起读取excel文件的所有工作表。

# import pandas lib as pd
import pandas as pd
  
# read all sheets together.
all_sheets_df = pd.read_excel( 'SampleWork.xlsx' , na_values = "Missing" , sheet_name = None )
  
print (all_sheets_df)

输出:

OrderedDict([('Sheet1', Name  Age    Stream  Percentage
0      Ankit   18      Math          95
1      Rahul   19   Science          90
2    Shaurya   20  Commerce          85
3  Aishwarya   18      Math          80
4   Priyanka   19   Science          75), ('Sheet2', Name  Age    Stream  Percentage
0     Priya   18      Math          95
1  shivangi   19   Science          90
2      Jeet   20  Commerce          85
3    Ananya   18      Math          80
4   Swapnil   19   Science          75), ('Sheet3', Name  Age   Stream  Percentage
0     Priya   18     Math          95
1  shivangi   19  Science          90
2      Jeet   20      NaN          85
3    Ananya   18     Math          80
4   Swapnil   19  Science          75)])

首先, 你的面试准备可通过以下方式增强你的数据结构概念:Python DS课程。


木子山

发表评论

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: