동일한 워크북의 여러 워크시트에 대해 pd.read

programing

동일한 워크북의 여러 워크시트에 대해 pd.read_excel()에 Panda 사용

prostudy 2022. 9. 9. 09:11

동일한 워크북의 여러 워크시트에 대해 pd.read_excel()에 Panda 사용

큰 스프레드시트 파일(.xlsx)을 가지고 있으며 파이썬 팬더를 사용하여 처리하고 있습니다.그 큰 파일에는 2개의 탭(시트)의 데이터가 필요합니다.탭 중 하나는 많은 양의 데이터가 있고 다른 하나는 단지 몇 개의 정사각형 셀입니다.

워크시트에서 사용하면 관심 있는 워크시트뿐만 아니라 전체 파일이 로드된 것처럼 보입니다.따라서 이 방법을 두 번 사용하면(각 시트당 한 번), 지정된 시트만 사용하더라도) 사실상 전체 워크북을 두 번 읽어야 합니다.

」「」「」「」「」「pd.read_excel()

시도:

xls = pd.ExcelFile('path_to_file.xls')
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')

@ 수 파일 가 @HaPsantran에서 .ExcelFile() 새시트에 .이렇게 하면 새 시트에 액세스할 때마다 동일한 파일을 읽을 필요가 없어집니다.

에 주의:sheet_name합니다.pd.read_excel(),번호 지정 , 인덱스 , 이름 또는 인덱스 목록, 시트 이름(0, 1 등), 시트 이름(0, 1 등), 시트 이름 또는 인덱스 목록입니다.None목록이 제공되면 사전이 반환됩니다. 여기서 키는 시트 이름/인디케이터이고 값은 데이터 프레임입니다.으로는 첫 , 첫 번째 시트)가됩니다.sheet_name=0를 참조해 주세요.

ifNone지정된 경우 모든 시트가 반환됩니다.{sheet_name:dataframe}★★★★★★ 。

몇 가지 옵션이 있습니다.

모든 시트를 정렬된 사전으로 직접 읽습니다.

import pandas as pd

# for pandas version >= 0.21.0
sheet_to_df_map = pd.read_excel(file_name, sheet_name=None)

# for pandas version < 0.21.0
sheet_to_df_map = pd.read_excel(file_name, sheetname=None)

첫 번째 시트를 직접 데이터 프레임에 읽습니다.

df = pd.read_excel('excel_file_path.xls')
# this will read the first sheet into df

엑셀 파일을 읽고 시트 목록을 얻으세요.그런 다음 시트를 선택하여 로드합니다.

xls = pd.ExcelFile('excel_file_path.xls')

# Now you can list all sheets in the file
xls.sheet_names
# ['house', 'house_extra', ...]

# to read just one sheet to dataframe:
df = pd.read_excel(file_name, sheet_name="house")

모든 시트를 읽고 사전에 저장하세요.처음과 같지만 좀 더 명확합니다.

# to read all sheets to a map
sheet_to_df_map = {}
for sheet_name in xls.sheet_names:
    sheet_to_df_map[sheet_name] = xls.parse(sheet_name)
    # you can also use sheet_index [0,1,2..] instead of sheet name.

@ihightower는 모든 시트를 읽을 수 있는 방법을 알려주고 @toto_tico, @red-headphone은 버전 문제를 지적해줘서 감사합니다.

sheetname : string, int, 문자열/int의 혼합 목록 또는 None, 기본값0.21.0 이후 사용되지 않음: 소스 링크 대신 sheet_name 사용

시트에 인덱스를 사용할 수도 있습니다.

xls = pd.ExcelFile('path_to_file.xls')
sheet1 = xls.parse(0)

첫 번째 워크시트가 표시됩니다.두 번째 워크시트의 경우:

sheet2 = xls.parse(1)

시트 이름을 매개변수로 지정할 수도 있습니다.

data_file = pd.read_excel('path_to_file.xls', sheet_name="sheet_name")

는 시트만 ."sheet_name".

옵션 1

시트명을 모르는 경우

# Read all sheets in your File
df = pd.read_excel('FILENAME.xlsm', sheet_name=None)
    
# Prints all the sheets name in an ordered dictionary
print(df.keys())

그 다음에 읽고 싶은 시트에 따라 각각 특정 시트에 전달할 수 있다.dataframe,예를 들어

sheet1_df = pd.read_excel('FILENAME.xlsm', sheet_name=SHEET1NAME)
sheet2_df = pd.read_excel('FILENAME.xlsm', sheet_name=SHEET2NAME)

옵션 2

이름이 관련이 없고 시트의 위치만 신경쓰는 경우.예를 들어 첫 번째 시트만 갖고 싶다고 칩시다.

# Read all sheets in your File
df = pd.read_excel('FILENAME.xlsm', sheet_name=None)

sheet1 = list(df.keys())[0]

그리고 시트 이름에 따라 각 시트 이름을 특정 시트에 전달할 수 있습니다.dataframe,예를 들어

sheet1_df = pd.read_excel('FILENAME.xlsm', sheet_name=SHEET1NAME)

pd.read_excel('filename.xlsx')

기본적으로 워크북의 첫 번째 시트를 읽습니다.

pd.read_excel('filename.xlsx', sheet_name = 'sheetname')

워크북의 특정 시트를 읽고

pd.read_excel('filename.xlsx', sheet_name = None)

Excel에서 panda 데이터 프레임까지 모든 워크시트를 읽습니다. OrderDict의 한 유형은 중첩된 데이터 프레임을 의미하며, 모든 워크시트는 데이터 프레임 내에서 수집된 데이터 프레임으로, 그 유형은 OrderDict입니다.

모든 시트를 읽고 합치려는 경우.가장 빠르고 최선의 방법

sheet_to_df_map = pd.read_excel('path_to_file.xls', sheet_name=None)
mdf = pd.concat(sheet_to_df_map, axis=0, ignore_index=True)

그러면 모든 시트가 단일 데이터 프레임 m_df로 변환됩니다.

다음 행을 사용하여 모든 시트를 읽을 수 있습니다.

import pandas as pd
file_instance = pd.ExcelFile('your_file.xlsx')

main_df = pd.concat([pd.read_excel('your_file.xlsx', sheet_name=name) for name in file_instance.sheet_names] , axis=0)

df = pd.read_excel('FileName.xlsx', 'SheetName')

이것은 시트를 읽을 것이다.SheetName파일로부터FileName.xlsx

다음 경우:

모든 워크시트가 아닌 여러 개의 워크시트가 필요합니다.
출력으로 단일 df를 원합니다.

그런 다음 워크시트 이름 목록을 전달할 수 있습니다.수동으로 입력할 수 있는 항목:

import pandas as pd
    
path = "C:\\Path\\To\\Your\\Data\\"
file = "data.xlsx"
sheet_lst_wanted = ["01_SomeName","05_SomeName","12_SomeName"] # tab names from Excel

### import and compile data ###
    
# read all sheets from list into an ordered dictionary    
dict_temp = pd.read_excel(path+file, sheet_name= sheet_lst_wanted)

# concatenate the ordered dict items into a dataframe
df = pd.concat(dict_temp, axis=0, ignore_index=True)

또는

원하는 워크시트에 불필요한 시트와 구분할 수 있는 공통 명명 규칙이 있는 경우 약간의 자동화가 가능합니다.

# substitute following block for the sheet_lst_wanted line in above block

import xlrd

# string common to only worksheets you want
str_like = "SomeName" 
    
### create list of sheet names in Excel file ###
xls = xlrd.open_workbook(path+file, on_demand=True)
sheet_lst = xls.sheet_names()
    
### create list of sheets meeting criteria  ###
sheet_lst_wanted = []
    
for s in sheet_lst:
    # note: following conditional statement based on my sheets ending with the string defined in sheet_like
    if s[-len(str_like):] == str_like:
        sheet_lst_wanted.append(s)
    else:
        pass

네, 안타깝게도 항상 전체 파일이 로딩됩니다.이 작업을 반복할 경우 시트를 분리하여 CSV로 추출한 후 별도로 로드하는 것이 가장 좋습니다.모든 시트 또는 여러 Excel 파일에서 모든 열이 동일한지 확인하는 등의 기능을 추가하는 d6tstack을 사용하여 이 프로세스를 자동화할 수 있습니다.

import d6tstack
c = d6tstack.convert_xls.XLStoCSVMultiSheet('multisheet.xlsx')
c.convert_all() # ['multisheet-Sheet1.csv','multisheet-Sheet2.csv']

d6tstack Excel 예시 참조

python 프로그램과 같은 폴더(상대 경로)에 Excel 파일을 저장한 경우 파일 이름과 함께 시트 번호만 언급하면 됩니다.

예:

 data = pd.read_excel("wt_vs_ht.xlsx", "Sheet2")
 print(data)
 x = data.Height
 y = data.Weight
 plt.plot(x,y,'x')
 plt.show()

언급URL : https://stackoverflow.com/questions/26521266/using-pandas-to-pd-read-excel-for-multiple-worksheets-of-the-same-workbook

저작자표시

'programing' 카테고리의 다른 글

nullable 필드에 정의된 MySql 고유 제약 조건은 정확히 어떻게 작동합니까? (0)	2022.09.09
PHPUnit을 사용하여 보호된 메서드를 테스트하는 모범 사례 (0)	2022.09.09
Lombok은 getter와 setter를 생성하지 않습니다. (0)	2022.09.09
업스트림에서 응답 헤더를 읽는 동안 업스트림에서 너무 큰 헤더가 전송되었습니다. (0)	2022.09.09
람다 식을 사용하여 요소가 존재하는지 확인하는 방법은 무엇입니까? (0)	2022.09.09

현재글동일한 워크북의 여러 워크시트에 대해 pd.read_excel()에 Panda 사용

각종 프로그래밍 정보를 다루는 블로그입니다.

vuejs2, vue-component, VUE, C, 전시, vuetify, python-2, vuex, react-redux, react-native, java, Vue-Router, react-hooks, Reactjs, python-3, Rxjs, Python, 공연, TypeScript, react-router,

Today :
Yesterday :

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

prostudy

동일한 워크북의 여러 워크시트에 대해 pd.read_excel()에 Panda 사용

동일한 워크북의 여러 워크시트에 대해 pd.read_excel()에 Panda 사용

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

동일한 워크북의 여러 워크시트에 대해 pd.read_excel()에 Panda 사용

동일한 워크북의 여러 워크시트에 대해 pd.read_excel()에 Panda 사용

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바