Programming/python

[์˜ค๋ฅ˜ ํ•ด๊ฒฐ] Error tokenizing data. C error: Expected * fields in line *, saw *

์„ผํ„ฐ๋”” 2023. 5. 9. 16:30

- ์˜ค๋ฅ˜ ํ˜„์ƒ

Pandas read_csv๋ฅผ ํ•  ๋•Œ ๋ฐœ์ƒ
(Error tokenizing data. C error: Expected 4 fields in line 2, saw 7)

- ์˜ค๋ฅ˜ ์›์ธ

๋ฐ์ดํ„ฐ ์ค‘๊ฐ„์— ์ฒซ ํ–‰๊ณผ ๋‹ค๋ฅธ ๊ธธ์ด๋ฅผ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ ๋ฐœ์ƒ

- ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•

1) ๋ฐ์ดํ„ฐ ์ˆ˜์ •ํ•˜๊ธฐ : ๋ณดํ†ต ๋ฐ์ดํ„ฐ ์˜ค๋ฅ˜์ธ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์œผ๋‹ˆ, ๋‹ค๋ฅธ ๊ธธ์ด๋ฅผ ๊ฐ€์ง€๋Š” ํ–‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ •ํ•œ๋‹ค.

2) ๋ฌธ์ œ๊ฐ€ ๋˜๋Š” ํ–‰์„ ๊ฑด๋„ˆ๋›ฐ๊ธฐ : error_bad_lines ์˜ต์…˜์„ False๋กœ ๋ฐ”๊พธ๋ฉด, ๋ฌธ์ œ๊ฐ€ ๋˜๋Š” ํ–‰์„ ๊ฑด๋„ˆ๋›ด๋‹ค.

pd.read_csv(path, error_bad_lines=False)


3)  ํŒŒ์ผ ์ „์ฒด ์Šค์บ” ํ›„, ๋‹ค์‹œ Pandas ๋กœ ์ฝ์–ด์ฃผ๊ธฐ : ํŠน์ • Row๋ฅผ ๊ฑด๋„ˆ๋›ฐ๋ฉด ์•ˆ๋˜๋Š”, ์ฆ‰ ์ฒซ๋ฒˆ์งธ ํ–‰์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๋ผ๋„ ์‚ด๋ ค์•ผํ•˜๋Š” ๊ฒฝ์šฐ ํ™œ์šฉํ•œ๋‹ค.

# ํŒŒ์ผ ์ „์ฒด ์Šค์บ”ํ•˜๋ฉด์„œ Row๋ณ„ Column	 ๊ฐœ์ˆ˜ ์„ธ๊ธฐ
delm = ','
with open("sample.csv", "r") as f:
	count_columns = [len(l.split(delm)) for l in f.readlines()]
# ๊ฐ€์žฅ ํฐ Column ๊ฐœ์ˆ˜ ๊ธฐ์ค€ Columns๋ช… ๋ฆฌ์ŠคํŠธ ๋งŒ๋“ค์–ด์ฃผ๊ธฐ 
columns = [i for i in range(0, max(count_columns)]

df = pd.read_csv("sample.csv", names=columns, delimiter=',')