๋ฐ˜์‘ํ˜•
250x250
Recent Posts
ยซ   2024/12   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31
๊ด€๋ฆฌ ๋ฉ”๋‰ด

Hello creators ๐Ÿ™Œ

[1-5] openpyxl, Workbook ํ™œ์šฉํ•ด์„œ ํŒŒ์ด์ฌ์œผ๋กœ ์—‘์…€ ๋‹ค๋ฃจ๊ธฐ (feat. ์ŠคํŒŒ๋ฅดํƒ€ ์ฝ”๋”ฉํด๋Ÿฝ ๋ฐ์ดํ„ฐ ๋ถ„์„ ์ˆ˜์—…) ๋ณธ๋ฌธ

[WEB & AI] (feat. ์ทจ์ค€)/Python

[1-5] openpyxl, Workbook ํ™œ์šฉํ•ด์„œ ํŒŒ์ด์ฌ์œผ๋กœ ์—‘์…€ ๋‹ค๋ฃจ๊ธฐ (feat. ์ŠคํŒŒ๋ฅดํƒ€ ์ฝ”๋”ฉํด๋Ÿฝ ๋ฐ์ดํ„ฐ ๋ถ„์„ ์ˆ˜์—…)

๋ถ€์‹œ๋งค๋‚˜_HA 2022. 12. 10. 10:37
728x90
๋ฐ˜์‘ํ˜•

 

 

Study Colab URL 

https://colab.research.google.com/drive/1coeNsJQ7EH-J-ZxL3XCECYPFGqkVZ1a3#scrollTo=P4L2IHWQjB_z

 

์ŠคํŒŒ๋ฅดํƒ€์ฝ”๋”ฉ_๊ตญ๋น„_๋ฐ์ดํ„ฐ๋ถ„์„.ipynb์˜ ์‚ฌ๋ณธ

Colaboratory notebook

colab.research.google.com

 


 

openpyxl ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜

!pip install openpyxl
  #๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜ 
    # ์ด๋ ‡๊ฒŒ ์“ฐ๋ฉด ๋œ๋‹ค~ ํ•˜๋Š” 'document' ๊ฐ€ ์žˆ์—ˆ์„ ๊ฒƒ 
    # ์ด๊ฒŒ '์ฝ”๋“œ ์Šค๋‹ˆํŽซ' ์œผ๋กœ ๋“ค์–ด๊ฐ

 

 

A1 ์…€์— ์•ˆ๋…•ํ•˜์„ธ์š” ๋„ฃ๊ธฐ

from openpyxl import Workbook   # openpyxl ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ค‘์—์„œ Workbook ํ•จ์ˆ˜? ํด๋ž˜์Šค? ๋ฅผ ์ž„ํฌํŠธ

wb= Workbook()    # workbook ํ•จ์ˆ˜, ํด๋ž˜์Šค? ๋ฅผ > wb ๋ผ๊ณ  ๋ถ€๋ฅผ๊ฑฐ์•ผ  
sheet = wb.active   # wb๋ผ๋Š” ํ•จ์ˆ˜, ํด๋ž˜์Šค ์ค‘ > active ๋ผ๋Š” ํ•จ์ˆ˜? ๋ฅผ ์“ธ๊ฑฐ์•ผ > ๊ทธ๋Ÿฌ๋ฉด ํ™œ์„ฑํ™”๋œ(active๋œ sheet๊ฐ€ ๋‚˜์˜ด) > ์ด๊ฑธ sheet ์— ๋„ฃ์„๊ฑฐ์•ผ. 

sheet['A1'] = '์•ˆ๋…•ํ•˜์„ธ์š”!'   # (์œ„์—์„œ) ๋งŒ๋“ค์–ด์ง„ sheet ์˜ AI ์…€์— '์•ˆ๋…•ํ•˜์„ธ์š”' ๋ฅผ ๋„ฃ์–ด๋ผ 

wb.save("์ƒ˜ํ”ŒํŒŒ์ผ.xlsx")    # ์ƒ˜ํ”Œ ํŒŒ์ผ ์ด๋ผ๋Š” ํŒŒ์ผ๋กœ ์ €์žฅํ•ด๋ผ 
wb.close()

# ์™ผ์ชฝ ํƒญ ์—ด์–ด์„œ > ํ•ด๋‹น ํŒŒ์ผ, '๋”๋ธ”ํด๋ฆญ' ํ•˜๋ฉด > ํด๋”๊ฐ€ ๋‹ค์šด ๋ฐ›์•„์ง!

https://colab.research.google.com/drive/1coeNsJQ7EH-J-ZxL3XCECYPFGqkVZ1a3#scrollTo=NuRX9uUNigfz&line=3&uniqifier=1

 

์ŠคํŒŒ๋ฅดํƒ€์ฝ”๋”ฉ_๊ตญ๋น„_๋ฐ์ดํ„ฐ๋ถ„์„.ipynb์˜ ์‚ฌ๋ณธ

Colaboratory notebook

colab.research.google.com

๋ฐ˜์‘ํ˜•

 

 

 

'์ƒ˜ํ”Œ ํŒŒ์ผ' ๋‹ค์šด ๋ฐ›์•„์„œ ์•„๋ž˜์™€ ๊ฐ™์ด ์ˆ˜์ • 

 

 

 

์ˆ˜์ •ํ•œ ํŒŒ์ผ ์—…๋กœ๋“œ ํ•˜๊ธฐ 

 

 

์—‘์…€ ์ฝ๊ธฐ

# ์—‘์…€ ์ฝ๊ธฐ 

import openpyxl   # ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๊ฐ€์ ธ์˜ฌ๊ฑฐ์•ผ 
wb = openpyxl.load_workbook('์ƒ˜ํ”ŒํŒŒ์ผ์ž„.xlsx')    # ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ load_wordkbook ์จ์„œ > ํ•ด๋‹น ์—‘์…€ ํŒŒ์ผ ์—ด๊ฑฐ์•ผ > ๊ทธ๋ฆฌ๊ณ  wb ์— ๋‹ด์„๊ฑฐ์•ผ
sheet = wb['Sheet']   # โ“ ์—‘์…€ ํŒŒ์ผ (xlsx) ์„ > ์ด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์•Œ์•„๋“ค์„ ์ˆ˜ ์žˆ๊ฒŒ sheet ๋กœ ๋ณ€ํ™˜?

print(sheet['A1'].value)     # ํ•ด๋‹น sheet ์ค‘ A1 ์…€์— ์žˆ๋Š” ๊ฐ’์„ ํ˜ธ์ถœ 
print(sheet['B1'].value)     # ํ•ด๋‹น sheet ์ค‘ A1 ์…€์— ์žˆ๋Š” ๊ฐ’์„ ํ˜ธ์ถœ

์‹คํ–‰๊ฒฐ๊ณผ

 

 

row ์ด์šฉํ•ด์„œ ํŽธํ•˜๊ฒŒ ์ฝ๊ธฐ 

# rows ์ด์šฉํ•ด์„œ ํŽธํ•˜๊ฒŒ ์ฝ๊ธฐ 

# ์—‘์…€ ์ฝ๊ธฐ 

import openpyxl   # ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๊ฐ€์ ธ์˜ฌ๊ฑฐ์•ผ 
wb = openpyxl.load_workbook('์ƒ˜ํ”ŒํŒŒ์ผ์ž„.xlsx')    # ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ load_wordkbook ์จ์„œ > ํ•ด๋‹น ์—‘์…€ ํŒŒ์ผ ์—ด๊ฑฐ์•ผ > ๊ทธ๋ฆฌ๊ณ  wb ์— ๋‹ด์„๊ฑฐ์•ผ
sheet = wb['Sheet']   # โ“ ์—‘์…€ ํŒŒ์ผ (xlsx) ์„ > ์ด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์•Œ์•„๋“ค์„ ์ˆ˜ ์žˆ๊ฒŒ sheet ๋กœ ๋ณ€ํ™˜?

rows = sheet.rows     # rows ์˜ ๊ธฐ๋Šฅ : sheet ์— ์žˆ๋Š” '๋ชจ๋“  ํ–‰์˜ ๊ฐ’'โญ ๋“ค์„ ๊ฐ€์ ธ์˜จ๋‹ค. 
# print(rows)

# [๋‹ค ๋‹ด๊ฒจ์žˆ๋Š” ๊ฑฐ์—์„œ > ํ•˜๋‚˜์”ฉ ๋นผ์„œ ์ถœ๋ ฅํ•˜๊ธฐ]
for row in rows:
  print(row[0].value, row[1].value, row[2].value )   # row ๋ณ€์ˆ˜์— ๋‹ด๊ฒจ์ง„ '๊ฐ’' ์„ ํ˜ธ์ถœํ•œ๋‹ค.

 

 

list ์‚ฌ์šฉํ•ด์„œ '๋งจ ์œ—์ค„' ๋ถ€ํ„ฐ ๋‚˜์˜ค๊ฒŒ ํ•˜๊ธฐ 

# list ๋ฅผ ์‚ฌ์šฉํ•ด์„œ > '๋งจ ์œ—์ค„' ๋ถ€ํ„ฐ ๋‚˜์˜ค๊ฒŒ ํ•˜๊ธฐ 
import openpyxl
wb = openpyxl.load_workbook('์ƒ˜ํ”ŒํŒŒ์ผ์ž„.xlsx')    # โœ… ์—ฌ๊ธฐ ํŒŒ์ผ ์ด๋ฆ„ ์ฒดํฌ ํ•  ๊ฒƒ / error ๊ฐ€๋Šฅํ•œ ์ง€์  
sheet = wb['Sheet']

rows = list(sheet.rows)[1:]   # row๊ฐ’๋“ค์„ ๋ฆฌ์ŠคํŠธ๋กœ ๋งŒ๋“ ๋‹ค > ์ฒซ ๋ฒˆ์งธ ๋ฆฌ์ŠคํŠธ ๋ถ€ํ„ฐ ์ฃผ๋ฃจ๋ฃฉ ๊ฐ€์ ธ์˜จ๋‹ค. 
                              # ๊ฑฐ์˜ ์ด๋ ‡๊ฒŒ ๋งŽ์ด ์‚ฌ์šฉํ•จ โญโญโญโญโญ 

for row in rows:
  print(row[0].value, row[1].value, row[2].value) #0๋ฒˆ์งธ ์…€ ๊ณ ๋ฅธ๊ฒƒ์—์„œ > value! ๋ฅผ ๋‹ฌ๋ผ , ์ด๊ฑธ ํ•ด์•ผ ๊ฐ’์ด ๋‚˜์˜ด

 

 

๊ฐ€๊ฒฉ์ด 300์› ๋ณด๋‹ค ์ž‘์€์• ๋“ค๋งŒ ๋ณด๊ธฐ

# ๊ฐ€๊ฒฉ์ด 300์› ๋ณด๋‹ค ์ž‘์€ ์• ๋“ค๋งŒ ๋ณด๊ธฐ 

import openpyxl
wb = openpyxl.load_workbook('์ƒ˜ํ”ŒํŒŒ์ผ์ž„.xlsx')    # โœ… ์—ฌ๊ธฐ ํŒŒ์ผ ์ด๋ฆ„ ์ฒดํฌ ํ•  ๊ฒƒ / error ๊ฐ€๋Šฅํ•œ ์ง€์  
sheet = wb['Sheet']

rows = list(sheet.rows)[1:]   # row๊ฐ’๋“ค์„ ๋ฆฌ์ŠคํŠธ๋กœ ๋งŒ๋“ ๋‹ค > ์ฒซ ๋ฒˆ์งธ ๋ฆฌ์ŠคํŠธ ๋ถ€ํ„ฐ ์ฃผ๋ฃจ๋ฃฉ ๊ฐ€์ ธ์˜จ๋‹ค. 
                              # ๊ฑฐ์˜ ์ด๋ ‡๊ฒŒ ๋งŽ์ด ์‚ฌ์šฉํ•จ โญโญโญโญโญ 

for row in rows:
  if row[2].value < 300 : # '๊ฐ€๊ฒฉ' ์€ row[2] ์— ์žˆ์Œ. > ๊ทธ ๊ฐ€๊ฒฉ์˜ value ๋ฅผ ๊บผ๋‚ด์„œ ๋ณธ๋‹ค > ๊ทธ๊ฒŒ 300 ๋ณด๋‹ค ์ž‘์œผ๋ฉด > ์•„๋ž˜์ค„์„ ์‹คํ–‰ํ•œ๋‹ค. 
                            # [์ƒˆ๋กญ๊ฒŒ ๋Š๋‚€์ ] ๊ฐ€๊ฒฉ์ด row[2] ์— ์žˆ๋‹ค๋Š” ๊ฒƒ! โญ
                            # ์ด๋ ‡๊ฒŒ, sequence ๋กœ, what's next ๋กœ ์ดํ•ดํ•˜๋ฉด ๋œ๋‹ค. โญโญโญโญโญ 
    print(row[2].value)   # 300์› ๋ณด๋‹ค ์ ์„ ๋•Œ์˜ '๊ฐ€๊ฒฉ' ์ด ๋‚˜์˜ค๊ฒ ์ง€

 

 

 

โญ ์œ„์˜ ๊ฒฐ๊ณผ๋ฅผ ์‘์šฉํ•ด์„œ > '์‚ผ์„ฑ์ „์ž' ๋‰ด์Šค ์Šคํฌ๋ž˜ํ•‘ ํ•œ๊ฒƒ์„ > '์‚ผ์„ฑ์ „์ž ์—‘์…€ ํŒŒ์ผ' ๋กœ ์ €์žฅํ•˜๊ธฐ_part1

# ์œ„์˜ ๊ฒฐ๊ณผ๋ฅผ ์‘์šฉํ•ด์„œ > '์‚ผ์„ฑ์ „์ž' ๋‰ด์Šค ์Šคํฌ๋ž˜ํ•‘ ํ•œ๊ฒƒ์„ > '์‚ผ์„ฑ์ „์ž ์—‘์…€ ํŒŒ์ผ' ๋กœ ์ €์žฅํ•˜๊ธฐ 

import requests
from bs4 import BeautifulSoup
from openpyxl import Workbook

def get_news(keyword):
  wb= Workbook()     # openpyxl ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ไธญ Workbook ํ•จ์ˆ˜?ํด๋ž˜์Šค? ๋ฅผ > wb ๋ผ๊ณ  ๋ถ€๋ฅผ๊ฑฐ์•ผ 
  sheet = wb.active     # wb ไธญ active ํ•จ์ˆ˜๋ฅผ > sheet ์— ๋‹ด์•„ (โ“ ๋‹ด์•„? ํ•จ์ˆ˜๋ฅผ? )

  headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
      # headers ๋Š” ๋ชจ๋ฅด๊ฒ ๋„ค ๐Ÿ˜ฅ๐Ÿ˜ฅโ“โ“ 
  data = requests.get(f'https://search.naver.com/search.naver?where=news&ie=utf8&sm=nws_hty&query={keyword}',headers=headers)
      # ๋‚ด๊ฐ€ ํŠน์ • ์ฃผ์†Œ์˜ keyword ๋ฅผ ๋„ฃ์œผ๋ฉด > ํ•ด๋‹น ์ฃผ์†Œ์— ๋Œ€ํ•œ keyword ์— ๋Œ€ํ•ด์„œ 'HTML, CSS, Javascript' ๋ฅผ ์š”์ฒญโ“ > ์ด๊ฑธ get ํ•ด > ๊ทธ๋ฆฌ๊ณ  data ์— ๋‹ด์•„

  soup = BeautifulSoup(data.text, 'html.parser')    # ๋ฐ›์•„์˜จ data ์—์„œ text ๋งŒ ๊บผ๋‚ดโ“ > html ๋กœ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ดโ“ > ๊ทธ๊ฑธ soup ์— ๋‹ด์•„

  lis = soup.select('#main_pack > section > div > div.group_news > ul > li')    
      # ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋Š” soup ์— > select ํ•จ์ˆ˜๋ฅผ ๋„ฃ์–ด์„œ ๋ถ„์„ํ•˜๊ณ  ์‹ถ์€ ์• ๋“ค css selector ๋กœ ๊ณจ๋ผ์™€ > ๊ทธ๋ฆฌ๊ณ  lis ์— ๋„ฃ์–ด
          # ์ด์ œ ๊ทธ๋Ÿฌ๋ฉด lis ์—๋Š” li ๋“ค์ด ๋ฆฌ์ŠคํŠธ ํ˜•ํƒœ๋กœ, ๋ฉ์–ด๋ฆฌ ์งธ๋กœ ๋“ค์–ด๊ฐ€๊ฒŒ ๋จ โญโญโญโญโญ 
          # [์ƒ๊ฐ] li ๊นŒ์ง€ ๋‚ด๋ ค๊ฐ”์„ ๋•Œ > li ๋“ค์ด, ๋ฆฌ์ŠคํŠธ๋กœ ์žˆ์–ด์„œ, for ๋ฌธ์„ ์“ธ ์ˆ˜ ์žˆ๋‹ค๋Š”๊ฒŒ ์ค‘์š”. / ์•„์ง 100% ์ดํ•ด๋Š” ์•ˆ ๋ผ ๐Ÿ“› / 
  
  # for ๋ฌธ ์‚ฌ์šฉํ•ด์„œ ๊ฐ€์ ธ์˜ค๊ธฐ 
  for li in lis:    # li ๋ฆฌ์ŠคํŠธ ๋ฉ์–ด๋ฆฌ ์ค‘ > ํ•˜๋‚˜์”ฉ ๊บผ๋‚ด์˜ฌ๊ฑฐ์•ผ 
    a = li.select_one('a.news_tit')   # ํ•˜๋‚˜ ๊บผ๋‚ด์˜จ li ์—์„œ > class ์ด๋ฆ„์ด new_tit ์ธ๊ฒƒ์„ ์ฐพ์•„์„œ > ๊ทธ ๋ถ€๋ถ„์„ a ์— ๋‹ด์•„
    print(a.text, a['href'])    # a ์ค‘์—์„œ text, ๋งํฌ ์ฃผ์†Œ๋ฅผ ๊บผ๋‚ด์„œ ํ”„๋ฆฐํŠธ ํ•ด์ค˜ 

  wb.save(f'{keyword}.xlsx')    # ์ง€๊ธˆ๊นŒ์ง€ ๋งŒ๋“ค์–ด์ง„๊ฑธ excel๋กœ ๋งŒ๋“ ๋‹ค. 
  wb.close()    # ์ข…๋ฃŒโ“ / ์™œ ์žˆ๋Š”์ง€๋Š” ๋ชจ๋ฅด๊ฒ ์Œ. 

get_news ('์‚ผ์„ฑ์ „์ž')

https://colab.research.google.com/drive/1coeNsJQ7EH-J-ZxL3XCECYPFGqkVZ1a3#scrollTo=2TS81VDcwGK4&line=15&uniqifier=1

 

์ŠคํŒŒ๋ฅดํƒ€์ฝ”๋”ฉ_๊ตญ๋น„_๋ฐ์ดํ„ฐ๋ถ„์„.ipynb์˜ ์‚ฌ๋ณธ

Colaboratory notebook

colab.research.google.com

 

 

 

 

โญ ์œ„์˜ ๊ฒฐ๊ณผ๋ฅผ ์‘์šฉํ•ด์„œ > '์‚ผ์„ฑ์ „์ž' ๋‰ด์Šค ์Šคํฌ๋ž˜ํ•‘ ํ•œ๊ฒƒ์„ > '์‚ผ์„ฑ์ „์ž ์—‘์…€ ํŒŒ์ผ' ๋กœ ์ €์žฅํ•˜๊ธฐ_part2

# ์œ„์˜ ๊ฒฐ๊ณผ๋ฅผ ์‘์šฉํ•ด์„œ > '์‚ผ์„ฑ์ „์ž' ๋‰ด์Šค ์Šคํฌ๋ž˜ํ•‘ ํ•œ๊ฒƒ์„ > '์‚ผ์„ฑ์ „์ž ์—‘์…€ ํŒŒ์ผ' ๋กœ ์ €์žฅํ•˜๊ธฐ_part2

import requests
from bs4 import BeautifulSoup
from openpyxl import Workbook

def get_news(keyword):
  wb= Workbook()     # openpyxl ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ไธญ Workbook ํ•จ์ˆ˜?ํด๋ž˜์Šค? ๋ฅผ > wb ๋ผ๊ณ  ๋ถ€๋ฅผ๊ฑฐ์•ผ 
  sheet = wb.active     # wb ไธญ active ํ•จ์ˆ˜๋ฅผ > sheet ์— ๋‹ด์•„ (โ“ ๋‹ด์•„? ํ•จ์ˆ˜๋ฅผ? )

  headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
      # headers ๋Š” ๋ชจ๋ฅด๊ฒ ๋„ค ๐Ÿ˜ฅ๐Ÿ˜ฅโ“โ“ 
  data = requests.get(f'https://search.naver.com/search.naver?where=news&ie=utf8&sm=nws_hty&query={keyword}',headers=headers)
      # ๋‚ด๊ฐ€ ํŠน์ • ์ฃผ์†Œ์˜ keyword ๋ฅผ ๋„ฃ์œผ๋ฉด > ํ•ด๋‹น ์ฃผ์†Œ์— ๋Œ€ํ•œ keyword ์— ๋Œ€ํ•ด์„œ 'HTML, CSS, Javascript' ๋ฅผ ์š”์ฒญโ“ > ์ด๊ฑธ get ํ•ด > ๊ทธ๋ฆฌ๊ณ  data ์— ๋‹ด์•„

  soup = BeautifulSoup(data.text, 'html.parser')    # ๋ฐ›์•„์˜จ data ์—์„œ text ๋งŒ ๊บผ๋‚ดโ“ > html ๋กœ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ดโ“ > ๊ทธ๊ฑธ soup ์— ๋‹ด์•„

  lis = soup.select('#main_pack > section > div > div.group_news > ul > li')    
      # ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋Š” soup ์— > select ํ•จ์ˆ˜๋ฅผ ๋„ฃ์–ด์„œ ๋ถ„์„ํ•˜๊ณ  ์‹ถ์€ ์• ๋“ค css selector ๋กœ ๊ณจ๋ผ์™€ > ๊ทธ๋ฆฌ๊ณ  lis ์— ๋„ฃ์–ด
          # ์ด์ œ ๊ทธ๋Ÿฌ๋ฉด lis ์—๋Š” li ๋“ค์ด ๋ฆฌ์ŠคํŠธ ํ˜•ํƒœ๋กœ, ๋ฉ์–ด๋ฆฌ ์งธ๋กœ ๋“ค์–ด๊ฐ€๊ฒŒ ๋จ โญโญโญโญโญ 
          # [์ƒ๊ฐ] li ๊นŒ์ง€ ๋‚ด๋ ค๊ฐ”์„ ๋•Œ > li ๋“ค์ด, ๋ฆฌ์ŠคํŠธ๋กœ ์žˆ์–ด์„œ, for ๋ฌธ์„ ์“ธ ์ˆ˜ ์žˆ๋‹ค๋Š”๊ฒŒ ์ค‘์š”. / ์•„์ง 100% ์ดํ•ด๋Š” ์•ˆ ๋ผ ๐Ÿ“› / 
  
  # for ๋ฌธ ์‚ฌ์šฉํ•ด์„œ ๊ฐ€์ ธ์˜ค๊ธฐ 
  for li in lis:    # li ๋ฆฌ์ŠคํŠธ ๋ฉ์–ด๋ฆฌ ์ค‘ > ํ•˜๋‚˜์”ฉ ๊บผ๋‚ด์˜ฌ๊ฑฐ์•ผ 
    a = li.select_one('a.news_tit')   # ํ•˜๋‚˜ ๊บผ๋‚ด์˜จ li ์—์„œ > class ์ด๋ฆ„์ด new_tit ์ธ๊ฒƒ์„ ์ฐพ์•„์„œ > ๊ทธ ๋ถ€๋ถ„์„ a ์— ๋‹ด์•„
    row = [a.text, a['href']]    # a ์ค‘์—์„œ text, ๋งํฌ ์ฃผ์†Œ๋ฅผ ๊บผ๋‚ด์ค˜ > ์ด๊ฑธ ๋ฆฌ์ŠคํŠธ์— ๋‹ด์•„ / โญ ์ด ๋ถ€๋ถ„์ด ๋ฐ”๋€Œ์—ˆ์Œ 
    sheet.append(row)   # row ๋ฅผ line9 ์—์„œ ๋งŒ๋“ค์–ด์ง„ sheet ์— ๋ถ™์—ฌ๋„ฃ๋Š”๋‹ค. โญโญโญ 
                          # ์œ„์—๋Š” print ๋ฅผ ํ–ˆ๋Š”๋ฐ, ์—ฌ๊ธฐ๊ฐ€ 'excel' ์— ๋„ฃ๋Š” ์ˆœ๊ฐ„์ž„ โญโญโญ 

  wb.save(f'{keyword}.xlsx')    # ์ง€๊ธˆ๊นŒ์ง€ ๋งŒ๋“ค์–ด์ง„๊ฑธ excel๋กœ ๋งŒ๋“ ๋‹ค. 
  wb.close()    # ์ข…๋ฃŒโ“ / ์™œ ์žˆ๋Š”์ง€๋Š” ๋ชจ๋ฅด๊ฒ ์Œ. 


get_news ('์‚ผ์„ฑ์ „์ž')     # ๋งŒ๋“  ํ•จ์ˆ˜ ์‹คํ–‰  
get_news ('ํ˜„๋Œ€์ž๋™์ฐจ')

 

 

 

datetime ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ด์šฉํ•ด์„œ '๋‚ ์งœ๋กœ๋œ ํด๋”' ๋งŒ๋“ค๊ธฐ  

# datetime ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ด์šฉํ•ด์„œ '๋‚ ์งœ๋กœ๋œ ํด๋”' ๋งŒ๋“ค๊ธฐ  

from datetime import datetime   # date time ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๊ฐ€์ ธ์™€์„œ > ๊ทธ ์ค‘์—์„œ dateime ํ•จ์ˆ˜(ํด๋ž˜์Šค?) ๋ฅผ ์“ธ๊ฑฐ์•ผ
a = datetime.today().strftime("%Y-%m-%d")   # ๊ทธ ํ•จ์ˆ˜(ํด๋ž˜์Šค?) ไธญ today ํ•จ์ˆ˜ ๋กœ ํ˜„์žฌ ์‹œ๊ฐ„์„ ๊ฐ€์ ธ์˜ค๊ณ  > ๊ทธ๊ฑธ string ์œผ๋กœ ๋ฐ”๊ฟ€ ๊ฑฐ์•ผ 
                                              # ์ด๋•Œ, - ๋ฅผ ์“ธ ์ˆ˜๋„ ์žˆ๊ณ , / ๋ฅผ ์“ธ ์ˆ˜๋„ ์žˆ์–ด 
print(a)

# cf. string ์„ _ ๋กœ ์“ฐ๋Š” ๊ฒฝ์šฐ
b = datetime.today().strftime("%Y_%m_%d")   # โœ… ์—ฌ๊ธฐ์—์„œ / ๋กœ ์ผ์–ด~ 
print(b)

 

 

โญโญโญ ์œ„์˜ ๊ฒฐ๊ณผ๋ฅผ ์‘์šฉํ•ด์„œ > '์‚ผ์„ฑ์ „์ž' ๋‰ด์Šค ์Šคํฌ๋ž˜ํ•‘ ํ•œ๊ฒƒ์„ > '์‚ผ์„ฑ์ „์ž ์—‘์…€ ํŒŒ์ผ' ๋กœ ์ €์žฅํ•˜๊ธฐ 

# ์œ„์˜ ๊ฒฐ๊ณผ๋ฅผ ์‘์šฉํ•ด์„œ > '์‚ผ์„ฑ์ „์ž' ๋‰ด์Šค ์Šคํฌ๋ž˜ํ•‘ ํ•œ๊ฒƒ์„ > '์‚ผ์„ฑ์ „์ž ์—‘์…€ ํŒŒ์ผ' ๋กœ ์ €์žฅํ•˜๊ธฐ 

import requests   # ์š”์ฒญํ•ด์„œ ์‚ผํ˜•์ œ ๊ฐ€์ ธ์˜ค๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ 
from bs4 import BeautifulSoup   # ๊ฐ€์ ธ์˜จ ์‚ผํ˜•์ œ๋ฅผ ๋ถ„์„ ๊ฐ€๋Šฅํ•œ๊ฑธ๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ 
from openpyxl import Workbook   # ์—‘์…€ ํŒŒ์ผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ 
from datetime import datetime   # ๋‚ ์งœ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ 

def get_news(keyword):
  wb= Workbook()     # openpyxl ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ไธญ Workbook ํ•จ์ˆ˜?ํด๋ž˜์Šค? ๋ฅผ > wb ๋ผ๊ณ  ๋ถ€๋ฅผ๊ฑฐ์•ผ 
  sheet = wb.active     # wb ไธญ active ํ•จ์ˆ˜๋ฅผ > sheet ์— ๋‹ด์•„ (โ“ ๋‹ด์•„? ํ•จ์ˆ˜๋ฅผ? )

  headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
      # headers ๋Š” ๋ชจ๋ฅด๊ฒ ๋„ค ๐Ÿ˜ฅ๐Ÿ˜ฅโ“โ“ 
  data = requests.get(f'https://search.naver.com/search.naver?where=news&ie=utf8&sm=nws_hty&query={keyword}',headers=headers)
      # ๋‚ด๊ฐ€ ํŠน์ • ์ฃผ์†Œ์˜ keyword ๋ฅผ ๋„ฃ์œผ๋ฉด > ํ•ด๋‹น ์ฃผ์†Œ์— ๋Œ€ํ•œ keyword ์— ๋Œ€ํ•ด์„œ 'HTML, CSS, Javascript' ๋ฅผ ์š”์ฒญโ“ > ์ด๊ฑธ get ํ•ด > ๊ทธ๋ฆฌ๊ณ  data ์— ๋‹ด์•„

  soup = BeautifulSoup(data.text, 'html.parser')    # ๋ฐ›์•„์˜จ data ์—์„œ text ๋งŒ ๊บผ๋‚ดโ“ > html ๋กœ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ดโ“ > ๊ทธ๊ฑธ soup ์— ๋‹ด์•„

  lis = soup.select('#main_pack > section > div > div.group_news > ul > li')    
      # ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋Š” soup ์— > select ํ•จ์ˆ˜๋ฅผ ๋„ฃ์–ด์„œ ๋ถ„์„ํ•˜๊ณ  ์‹ถ์€ ์• ๋“ค css selector ๋กœ ๊ณจ๋ผ์™€ > ๊ทธ๋ฆฌ๊ณ  lis ์— ๋„ฃ์–ด
          # ์ด์ œ ๊ทธ๋Ÿฌ๋ฉด lis ์—๋Š” li ๋“ค์ด ๋ฆฌ์ŠคํŠธ ํ˜•ํƒœ๋กœ, ๋ฉ์–ด๋ฆฌ ์งธ๋กœ ๋“ค์–ด๊ฐ€๊ฒŒ ๋จ โญโญโญโญโญ 
          # [์ƒ๊ฐ] li ๊นŒ์ง€ ๋‚ด๋ ค๊ฐ”์„ ๋•Œ > li ๋“ค์ด, ๋ฆฌ์ŠคํŠธ๋กœ ์žˆ์–ด์„œ, for ๋ฌธ์„ ์“ธ ์ˆ˜ ์žˆ๋‹ค๋Š”๊ฒŒ ์ค‘์š”. / ์•„์ง 100% ์ดํ•ด๋Š” ์•ˆ ๋ผ ๐Ÿ“› / 
  
  # for ๋ฌธ ์‚ฌ์šฉํ•ด์„œ ๊ฐ€์ ธ์˜ค๊ธฐ 
  for li in lis:    # li ๋ฆฌ์ŠคํŠธ ๋ฉ์–ด๋ฆฌ ์ค‘ > ํ•˜๋‚˜์”ฉ ๊บผ๋‚ด์˜ฌ๊ฑฐ์•ผ 
    a = li.select_one('a.news_tit')   # ํ•˜๋‚˜ ๊บผ๋‚ด์˜จ li ์—์„œ > class ์ด๋ฆ„์ด new_tit ์ธ๊ฒƒ์„ ์ฐพ์•„์„œ > ๊ทธ ๋ถ€๋ถ„์„ a ์— ๋‹ด์•„
    row = [a.text, a['href']]    # a ์ค‘์—์„œ text, ๋งํฌ ์ฃผ์†Œ๋ฅผ ๊บผ๋‚ด์ค˜ > ์ด๊ฑธ ๋ฆฌ์ŠคํŠธ์— ๋‹ด์•„ / โญ ์ด ๋ถ€๋ถ„์ด ๋ฐ”๋€Œ์—ˆ์Œ 
    sheet.append(row)   # row ๋ฅผ line9 ์—์„œ ๋งŒ๋“ค์–ด์ง„ sheet ์— ๋ถ™์—ฌ๋„ฃ๋Š”๋‹ค. โญโญโญ 
                          # ์œ„์—๋Š” print ๋ฅผ ํ–ˆ๋Š”๋ฐ, ์—ฌ๊ธฐ๊ฐ€ 'excel' ์— ๋„ฃ๋Š” ์ˆœ๊ฐ„์ž„ โญโญโญ 
  today = datetime.today().strftime("%Y_%m_%d")   # ์˜ค๋Š˜ ๋‚ ์งœ ๊ฐ€์ ธ์™€์„œ > string ์œผ๋กœ ๋ฐ”๊พธ๊ณ  > today ์— ๋„ฃ์–ด

  wb.save(f'{today}_{keyword}.xlsx')    # ์ง€๊ธˆ๊นŒ์ง€ ๋งŒ๋“ค์–ด์ง„๊ฑธ excel๋กœ ๋งŒ๋“ ๋‹ค. 
                                        # 30 line ์—์„œ ๋งŒ๋“  ๋‚ ์งœ๋ฅผ > f-string ์œผ๋กœ ๋„ฃ๊ธฐ 
                                        # โœ… ํ•จ์ˆ˜ ์ธ์ž ๋„ฃ์–ด์คŒ 

  wb.close()    # ์ข…๋ฃŒโ“ / ์™œ ์žˆ๋Š”์ง€๋Š” ๋ชจ๋ฅด๊ฒ ์Œ. 


get_news ('์‚ผ์„ฑ์ „์ž')     # ๋งŒ๋“  ํ•จ์ˆ˜ ์‹คํ–‰  
get_news ('ํ˜„๋Œ€์ž๋™์ฐจ')

 

 

โญโญโญ news ํด๋”์— ์Œ“์ด๊ฒŒ ํ•˜๊ธฐ

# news ๋ผ๋Š” ํด๋”! ์— ์Œ“์ด๊ฒŒ ํ•˜๊ธฐ! 


import requests   # ์š”์ฒญํ•ด์„œ ์‚ผํ˜•์ œ ๊ฐ€์ ธ์˜ค๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ 
from bs4 import BeautifulSoup   # ๊ฐ€์ ธ์˜จ ์‚ผํ˜•์ œ๋ฅผ ๋ถ„์„ ๊ฐ€๋Šฅํ•œ๊ฑธ๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ 
from openpyxl import Workbook   # ์—‘์…€ ํŒŒ์ผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ 
from datetime import datetime   # ๋‚ ์งœ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ 

def get_news(keyword):
  wb= Workbook()     # openpyxl ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ไธญ Workbook ํ•จ์ˆ˜?ํด๋ž˜์Šค? ๋ฅผ > wb ๋ผ๊ณ  ๋ถ€๋ฅผ๊ฑฐ์•ผ 
  sheet = wb.active     # wb ไธญ active ํ•จ์ˆ˜๋ฅผ > sheet ์— ๋‹ด์•„ (โ“ ๋‹ด์•„? ํ•จ์ˆ˜๋ฅผ? )

  headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
      # headers ๋Š” ๋ชจ๋ฅด๊ฒ ๋„ค ๐Ÿ˜ฅ๐Ÿ˜ฅโ“โ“ 
  data = requests.get(f'https://search.naver.com/search.naver?where=news&ie=utf8&sm=nws_hty&query={keyword}',headers=headers)
      # ๋‚ด๊ฐ€ ํŠน์ • ์ฃผ์†Œ์˜ keyword ๋ฅผ ๋„ฃ์œผ๋ฉด > ํ•ด๋‹น ์ฃผ์†Œ์— ๋Œ€ํ•œ keyword ์— ๋Œ€ํ•ด์„œ 'HTML, CSS, Javascript' ๋ฅผ ์š”์ฒญโ“ > ์ด๊ฑธ get ํ•ด > ๊ทธ๋ฆฌ๊ณ  data ์— ๋‹ด์•„

  soup = BeautifulSoup(data.text, 'html.parser')    # ๋ฐ›์•„์˜จ data ์—์„œ text ๋งŒ ๊บผ๋‚ดโ“ > html ๋กœ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ดโ“ > ๊ทธ๊ฑธ soup ์— ๋‹ด์•„

  lis = soup.select('#main_pack > section > div > div.group_news > ul > li')    
      # ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋Š” soup ์— > select ํ•จ์ˆ˜๋ฅผ ๋„ฃ์–ด์„œ ๋ถ„์„ํ•˜๊ณ  ์‹ถ์€ ์• ๋“ค css selector ๋กœ ๊ณจ๋ผ์™€ > ๊ทธ๋ฆฌ๊ณ  lis ์— ๋„ฃ์–ด
          # ์ด์ œ ๊ทธ๋Ÿฌ๋ฉด lis ์—๋Š” li ๋“ค์ด ๋ฆฌ์ŠคํŠธ ํ˜•ํƒœ๋กœ, ๋ฉ์–ด๋ฆฌ ์งธ๋กœ ๋“ค์–ด๊ฐ€๊ฒŒ ๋จ โญโญโญโญโญ 
          # [์ƒ๊ฐ] li ๊นŒ์ง€ ๋‚ด๋ ค๊ฐ”์„ ๋•Œ > li ๋“ค์ด, ๋ฆฌ์ŠคํŠธ๋กœ ์žˆ์–ด์„œ, for ๋ฌธ์„ ์“ธ ์ˆ˜ ์žˆ๋‹ค๋Š”๊ฒŒ ์ค‘์š”. / ์•„์ง 100% ์ดํ•ด๋Š” ์•ˆ ๋ผ ๐Ÿ“› / 
  
  # for ๋ฌธ ์‚ฌ์šฉํ•ด์„œ ๊ฐ€์ ธ์˜ค๊ธฐ 
  for li in lis:    # li ๋ฆฌ์ŠคํŠธ ๋ฉ์–ด๋ฆฌ ์ค‘ > ํ•˜๋‚˜์”ฉ ๊บผ๋‚ด์˜ฌ๊ฑฐ์•ผ 
    a = li.select_one('a.news_tit')   # ํ•˜๋‚˜ ๊บผ๋‚ด์˜จ li ์—์„œ > class ์ด๋ฆ„์ด new_tit ์ธ๊ฒƒ์„ ์ฐพ์•„์„œ > ๊ทธ ๋ถ€๋ถ„์„ a ์— ๋‹ด์•„
    row = [a.text, a['href']]    # a ์ค‘์—์„œ text, ๋งํฌ ์ฃผ์†Œ๋ฅผ ๊บผ๋‚ด์ค˜ > ์ด๊ฑธ ๋ฆฌ์ŠคํŠธ์— ๋‹ด์•„ / โญ ์ด ๋ถ€๋ถ„์ด ๋ฐ”๋€Œ์—ˆ์Œ 
    sheet.append(row)   # row ๋ฅผ line9 ์—์„œ ๋งŒ๋“ค์–ด์ง„ sheet ์— ๋ถ™์—ฌ๋„ฃ๋Š”๋‹ค. โญโญโญ 
                          # ์œ„์—๋Š” print ๋ฅผ ํ–ˆ๋Š”๋ฐ, ์—ฌ๊ธฐ๊ฐ€ 'excel' ์— ๋„ฃ๋Š” ์ˆœ๊ฐ„์ž„ โญโญโญ 
  today = datetime.today().strftime("%Y_%m_%d")   # ์˜ค๋Š˜ ๋‚ ์งœ ๊ฐ€์ ธ์™€์„œ > string ์œผ๋กœ ๋ฐ”๊พธ๊ณ  > today ์— ๋„ฃ์–ด

  wb.save(f'news/{today}_{keyword}.xlsx')    # ์ง€๊ธˆ๊นŒ์ง€ ๋งŒ๋“ค์–ด์ง„๊ฑธ excel๋กœ ๋งŒ๋“ ๋‹ค. 
                                        # 30 line ์—์„œ ๋งŒ๋“  ๋‚ ์งœ๋ฅผ > f-string ์œผ๋กœ ๋„ฃ๊ธฐ 
                                        # โœ… ํ•จ์ˆ˜ ์ธ์ž ๋„ฃ์–ด์คŒ 
                                        # โœ… ์—ฌ๊ธฐ์— news ํด๋” ๋งŒ๋“ค์–ด์คŒ / news ๋ฅผ ์—ฌ๊ธฐ์— ์“ฐ๋ฉด ์™œ ํด๋”๊ฐ€ ๋งŒ๋“ค์–ด์ง€๋Š” ๊ฑด์ง€๋Š” ์•„์ง ๋ชจ๋ฅด๊ฒ ์Œ. ๐Ÿ“› 

  wb.close()    # ์ข…๋ฃŒโ“ / ์™œ ์žˆ๋Š”์ง€๋Š” ๋ชจ๋ฅด๊ฒ ์Œ. 


get_news ('์‚ผ์„ฑ์ „์ž')     # ๋งŒ๋“  ํ•จ์ˆ˜ ์‹คํ–‰  
get_news ('ํ˜„๋Œ€์ž๋™์ฐจ')

 


์ด์ „ ์ŠคํŒŒ๋ฅดํƒ€ ์ฝ”๋”ฉ ํด๋Ÿฝ ์ˆ˜์—…

โœ… 2022.12.08 - [[AI]/Python] - ๋‰ด์Šค ๊ธฐ์‚ฌ ์ œ๋ชฉ ๊ฐ€์ ธ์˜ค๊ธฐ [python beautifulsoup request ํ™œ์šฉ crwaling] (Feat. ์ŠคํŒŒ๋ฅดํƒ€ ์ฝ”๋”ฉํด๋Ÿฝ_๋ฐ์ดํ„ฐ๋ถ„์„ ์ˆ˜์—…)


๊ฐ•์˜ URL https://bit.ly/3W2CYsK
Colab URL https://bit.ly/3Ph9e9p

 

728x90
๋ฐ˜์‘ํ˜•

'[WEB & AI] (feat. ์ทจ์ค€) > Python' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[2-1&2] Pandas ๊ธฐ์ดˆ (feat. ์ŠคํŒŒ๋ฅดํƒ€์ฝ”๋”ฉํด๋Ÿฝ_๋ฐ์ดํ„ฐ๋ถ„์„์ˆ˜์—…)  (0) 2022.12.13
[1-8] ํŒŒ์ด์ฌ์œผ๋กœ 50๊ฐœ ๊ธฐ์—… 3๋…„์น˜ ์ฃผ์‹ ๊ทธ๋ž˜ํ”„ ํ•œ๋ฒˆ์— ๋ฐ›์•„๋ณด๊ธฐ (feat. ์ŠคํŒŒ๋ฅดํƒ€์ฝ”๋”ฉํด๋Ÿฝ_๋ฐ์ดํ„ฐ๋ถ„์„์ˆ˜์—…)  (2) 2022.12.10
[1-7] urllib.request ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์‚ฌ์šฉํ•ด์„œ ์ด๋ฏธ์ง€ ๋‹ค์šด ๋ฐ›๊ธฐ (feat. ์ŠคํŒŒ๋ฅดํƒ€ ์ฝ”๋”ฉํด๋Ÿฝ ๋ฐ์ดํ„ฐ ๋ถ„์„ ์ˆ˜์—…)  (0) 2022.12.10
[1-6] os ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ™œ์šฉํ•ด์„œ ํŒŒ์ด์ฌ ํŒŒ์ผ ๋‹ค์šด๋กœ๋“œ, ์ด๋ฆ„ ๋ฐ”๊พธ๊ธฐ (feat. ์ŠคํŒŒ๋ฅดํƒ€์ฝ”๋”ฉํด๋Ÿฝ ๋ฐ์ดํ„ฐ ๋ถ„์„ ์ˆ˜์—…)  (0) 2022.12.10
[1-4] ๋‰ด์Šค ๊ธฐ์‚ฌ ์ œ๋ชฉ ๊ฐ€์ ธ์˜ค๊ธฐ [python beautifulsoup request ํ™œ์šฉ crwaling] (Feat. ์ŠคํŒŒ๋ฅดํƒ€ ์ฝ”๋”ฉํด๋Ÿฝ_๋ฐ์ดํ„ฐ๋ถ„์„ ์ˆ˜์—…)  (0) 2022.12.08
Comments