[DAY 5] Python: Lecture 11. IO

Notice

Recent Posts

Recent Comments

Link

250x250

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Archives

Today

Total

관리 메뉴

욤미의 개발일지

[DAY 5] Python: Lecture 11. IO 본문

NLP/STUDY

[DAY 5] Python: Lecture 11. IO

욤미 2022. 11. 3. 15:49

728x90

[Lecture 11] IO

Python의 Input/Output JSON, YAML, txt 파일 관련

파일 입출력은 OS 에서 관장한다.

Standard Input&Ouput

따로 Redirection 없으면 콘솔 입/출력

# 표준 출력(stdout)
print("This", "Sentence") # This Sentence 기본적으로 띄어쓰기로 연결
print("This", "Sentence", sep=", ") # This, Sentence 구분자 변경가능

# 표준 입력(stdin)
var = input()

> 로 redirection 가능
- pyhton test.py > output.txt python test.py < input.txt
| pipeline
- 한 표준 출력값을 다른 입력의 표준 입력 값으로 집어 넣는다.

File Open

f = open("파일이름", "접근 모드", encoding="utf8") # 파일 열기
f.close() # 파일 닫기, 리소스를 잡아 먹음

파이썬은 file descriptor를 열기 위해 oepn 내장 함수 사용
- r : 읽기 모드, 파일을 텍스트 형태로 읽음
- rb : 이진 읽기 모드, 파일을 바이너리 형태로 읽음
- w : 쓰기 모드, 파일을 텍스트 형태로 씀(기존 내용 덮어씀)
- wb : 이진 쓰기 모드, 파일을 바이너리 형태로 씀
- a :추가 모드, 파일 마지막에 새로운 텍스트를 추가

File Read

f = open("test.txt", "r") # 파일 열기
contents = f.read() # 파일 전체 읽기
f.close() # 파일 닫기, 리소스를 잡아 먹음

read 메소드로 파일 읽기 가능
file descriptor 닫는 것을 깜빡할 때가 많음
- context manager 형태로 사용 → 구문이 끝나면 자동으로 닫아줌
- with [ContextManager] as [ReturnValue]
```
with open("test.txt", "r") as f: # 파일 열기
	contents = f.read() # 파일 전체 읽기
f.close() # 파일 닫기, 리소스를 잡아 먹음
```

File Read Lines

for문으로 줄 단위로 잘라서 읽기, \n가 사라지는 건 아니다.

content = []

with open("test.txt", "r") as f: # 파일 열기
	for sentence in f:
		content.append(sentence) # readline()도 활용 가능

readlines 전체를 읽고 줄 단위로 잘라서 string list 반환

with open("test.txt", "r") as f: # 파일 열기
	content = f.readline() # 전체를 읽고 줄 단위로 자름

File Write

write 메소드로 파일 쓰기
- read 함수와 다르게 \n가 자동으로 붙지 않는다. → 붙여줘야함

with open("test.txt", "w") as f: # 파일 열기
	for i in range(10):
		f.write(f"{i+1}번 째 문장\\n") # 줄바꿈 기호 추가

writelines 메소드로 여러줄 작성

with open("test.txt", "w") as f: # 파일 열기
	f.writelines(f"{i+1}번 째 문장\\n" for i in range(10)) # string iterable 쓰기

a 모드로 파일 뒤에 추가

i=10
with open("test.txt", "a") as f: # 파일 열기
	f.write("내용을 추가합니다.\\n")
	f.writelines(f"{i+1}번 째 문장\\n" for i in range(i, i+10)) # string iterable 쓰기

Listing Directory

listdir : 폴더내 파일/하위 폴더 검색

import os
print(*[entry for entry in os.listdir('test')]) # unpacking print

glob 라이브러리: 유닉스 스타일 경로명 패턴 확장 적용

import glob
print(*[entry for entry in glob.glob('test/*.txt')])

Pickle

파이썬 객체를 그대로 저장
객체를 직렬화(serialize)하여 파일로 저장
장점: 쓰기 쉽고 파이썬 객체를 그대로 저장
단점: 파이썬에서만 읽을 수 있고 보안의 문제가 있다.

import pickle

with open("test.pkl", "wb") as f:
	pickle.dump(객체, f)

with open("test.pkl", "rb") as f:
	seq = pickle.load(객체, f)

Class Pickling

class도 pickle로 저장가능하다.
class 객체를 직렬하기 위해선 해당 클래스가 직렬화 가능 해야한다.
- 모든 속성(attribute)이 직렬화 가능해야한다.
저장된 객체 pickle을 로드하고 싶다면 미리 해당 클래스 선언 필요
- 해당 클래스 정보가 없다면 역직렬화 불가

CSV(Comma Seperate Values)

표 데이터를 프로그램에 상관없이 쓰기 위한 데이터 형식
- 필드를 쉼표(,)로 구분한 텍스트 파일
- 탭(TSV), 공백(SSV) 등으로 구분하기도 함

readlines 로 읽을 수 있지만 구현이 귀찮다.

→ csv 라이브러리로 쉽게 활용 가능

Reading CSV

import csv

with open('test.csv', 'r') as f:
	reader = csv.reader(f, # file descriptor 필수
				delimiter=',',   # 구분자, 기본: ,
				quotechar='"',   # 텍스트 감싸는 문자, 기본: "
				quoting = csv.QUOTE_MINIMAL # Parsing 방식, 기본: 최소길이

	for entry in reader:
		print(entry) # row를 list 형태로 출력

Writing CSV

import csv

with open('test.csv', 'w') as f:
	writer = csv.reader(f, # file descriptor 필수
				delimiter=',',   # 구분자, 기본: ,
				quotechar='"',   # 텍스트 감싸는 문자, 기본: "
				quoting = csv.QUOTE_MINIMAL # Parsing 방식, 기본: 최소길이

	writer.writerow(['id', 'label']) # 한 줄 쓰기
	writer.writerows([i, f'label{i}'] for i in range(10)) # 여러줄 쓰기

구조화된 것을 표현하기 어려움 - 딕셔너리, 이중리스트 등 → JSON 활용

JSON(JavaScript Object Notation)

웹 언어인 Javascript데이터 객체 표현 방식, 딕셔너리와 비슷한 형식
자료 구조 양식을 문자열로 표현
간결하게 표현되어 사람, 컴퓨터 모두 읽기 편함
코드에서 불러오기 쉽고 파일 크기가 작은 편
최근 각광받는 자료 구조 형식

그럼에도 Parser를 직접 작성하는 것은 매우 귀찮음

Reading JSON

import json

with open('test.json', 'r') as f:
	data = json.load(f)

print(data['key1']) # 파이썬 딕셔너리 처럼 사용하면 됨
print(data['key1'][0]['key2'][2])

Writing JSON

import json

with open('test.json', 'w') as f:
	json.dump(obj, f) # obj는 저장하고자 하는 객체

직렬화 가능한 객체만 저장 가능
- 원시 타입: str, int, float, bool, None
- 자료구조: list, dictionary
- 이 외에는 decoder 작성 필요

XML(eXtensible Markup Language)

데이터 구조와 의미를 설명하는 태그를 활용한 언어
<태그> </태그> 사이에 값이 표시
모두 문자열로 처리
<태그 속성=값> 형태로 태그에 속성 부여 가능
html은 웹 페이지 표시를 위한 xml이다.
- xml은 데이터 저장 양식이기때문에 태그 내용은 상관이 없다.
- html은 태그와 속성에 의미를 정해둠.
정규표현식으로 파싱가능

Beautiful Soup

파이썬 기본 xml parser는 다소 불편하기 때문에 외부 라이브러리를 사용한다.
속도는 다소 느리지만 간편하게 사용 가능

Reading XML

from bs4 import BeautifulSoup

with open('test.xml', 'r') as f:
	soup = BeautifulSoup(
		f.read(),    # 파싱할 문자열
		'html.parser' # 사용할 파서
)

to_tag = soup.find(name='to') #'to' 태그 찾기
print(to_tag.string) # 'to' 태그 내 문자열 출력

for cite_tag in soup.findAll(name='cite'): # cite 태그 모두 찾기
		print(cite_tag.string)

cites_tag = soup.find(name='cites') #'cites' 태그 찾기
print(to_tag.attrs) # 'cites' 태그의 모든 속성
print(to_tag['attr']) # 'attr' 속성 값 참조

cites_tag = soup.find(attrs={'attr': 'name'}) # 속성으로 태그 찾기
for cite_tag in cites_tag.find_all(name='cite'): # 태그 내 검색
	print(cite_tag.string)

YAML(Yaml Ain’t Markup Language)

e-mail 양식에서 개념을 얻은 데이터 직렬화 양식
- 들여쓰기로 구조체를 구분
- key: value 형식의 해시 및 - item 형식 리스트 사용
- 공백 없는 텍스트는 따옴표 없이 사용간으
- .yaml / .yml 확장자
- python 에서는 pyyaml 라이브러리를 사용

Reading YAML

import yaml
import pprint # 포맷을 예쁘게 출력해줌

with open('test.yaml') as f:
	data = yaml.load(f, Loader=yaml.FullLoader)

pprint.pprint(data)

728x90

저작자표시 비영리 변경금지 (새창열림)

'NLP > STUDY' 카테고리의 다른 글

[DAY 5] Python: Lecture 13. Web (0)	2022.11.07
[DAY 5] Python: Lecture 12. Setting & Exception & Logging (0)	2022.11.06
[DAY 4] Python: Lecture 10. String (0)	2022.11.02
[DAY 4] Python: Lecture 9. Advanced Data Structure (0)	2022.10.21
[DAY 3] Python: Lecture 8. Module & Package (0)	2022.10.18

'NLP/STUDY' Related Articles

Comments

욤미의 개발일지

[DAY 5] Python: Lecture 11. IO 본문

[DAY 5] Python: Lecture 11. IO

[Lecture 11] IO

Standard Input&Ouput

File Open

File Read

File Read Lines

File Write

Directory

Listing Directory

Pickle

Class Pickling

CSV(Comma Seperate Values)

JSON(JavaScript Object Notation)

XML(eXtensible Markup Language)

YAML(Yaml Ain’t Markup Language)

'NLP > STUDY' 카테고리의 다른 글

티스토리툴바