๋ฐ˜์‘ํ˜•
250x250
Recent Posts
ยซ   2024/12   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31
๊ด€๋ฆฌ ๋ฉ”๋‰ด

Hello creators ๐Ÿ™Œ

3-13_์›น์Šคํฌ๋ž˜ํ•‘ ๊ฒฐ๊ณผ ์ €์žฅํ•˜๊ธฐ ๋ณธ๋ฌธ

[WEB & AI] (feat. ์ทจ์ค€)/์ŠคํŒŒ๋ฅดํƒ€์ฝ”๋”ฉํด๋Ÿฝ_์›น๊ฐœ๋ฐœ์ข…ํ•ฉ๋ฐ˜

3-13_์›น์Šคํฌ๋ž˜ํ•‘ ๊ฒฐ๊ณผ ์ €์žฅํ•˜๊ธฐ

๋ถ€์‹œ๋งค๋‚˜_HA 2023. 11. 27. 08:34
728x90
๋ฐ˜์‘ํ˜•

โ… . ์ด๋ฒˆ ํšŒ์ฐจ ํ•™์Šต ๋ชฉํ‘œ (goal)


1. ๋ฐฐ์šฐ๊ณ ์ž ํ•˜๋Š” ๊ฒƒ

  • ํฌ๋กค๋งํ•œ ์˜ํ™”, ์ˆœ์œ„, ์ œ๋ชฉ, ๋ณ„์ , ์„ DB ์— ๋„ฃ๊ธฐ



 


โ…ก. ํ•ด๋ณด๊ธฐ


1. ํฌ๋กค๋งํ•œ ์ฝ”๋“œ ๊ฐ€์ ธ์˜ค๊ธฐ

๐Ÿ˜ฅ๐Ÿ˜ฅ๐Ÿ˜ฅ๐Ÿ˜ฅ๐Ÿ˜ฅ๐Ÿ˜ฅ๐Ÿ˜ฅ๐Ÿ˜ฅ๐Ÿ˜ฅ๐Ÿ˜ฅ๐Ÿ˜ฅ๐Ÿ˜ฅ ๊ทผ๋ฐ ์ด๊ฒŒ ์ž‘๋™ ์•ˆ ํ•ด
โญโญโญโญโญโญโญโญโญโญโญ ๊ทผ๋ฐ ์ฃผ์„์€ ์ดํ•ดํ•˜๋Š”๋ฐ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์„๊ฑฐ ๊ฐ™์€๋ฐ

import requests
from bs4 import BeautifulSoup

# URL์„ ์ฝ์–ด์„œ HTML๋ฅผ ๋ฐ›์•„์˜ค๊ณ ,
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('https://movie.naver.com/movie/sdb/rank/rmovie.naver?sel=pnt&date=20210829',headers=headers)

# HTML์„ BeautifulSoup์ด๋ผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•ด ๊ฒ€์ƒ‰ํ•˜๊ธฐ ์šฉ์ดํ•œ ์ƒํƒœ๋กœ ๋งŒ๋“ฆ
soup = BeautifulSoup(data.text, 'html.parser')

# select๋ฅผ ์ด์šฉํ•ด์„œ, tr๋“ค์„ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
movies = soup.select('#old_content > table > tbody > tr')

# movies (tr๋“ค) ์˜ ๋ฐ˜๋ณต๋ฌธ์„ ๋Œ๋ฆฌ๊ธฐ
for movie in movies:
# movie ์•ˆ์— a ๊ฐ€ ์žˆ์œผ๋ฉด,
a_tag = movie.select_one('td.title > div > a')
if a_tag is not None:
rank = movie.select_one('td:nth-child(1) > img')['alt'] # img ํƒœ๊ทธ์˜ alt ์†์„ฑ๊ฐ’์„ ๊ฐ€์ ธ์˜ค๊ธฐ
title = a_tag.text # a ํƒœ๊ทธ ์‚ฌ์ด์˜ ํ…์ŠคํŠธ๋ฅผ ๊ฐ€์ ธ์˜ค๊ธฐ
star = movie.select_one('td.point').text # td ํƒœ๊ทธ ์‚ฌ์ด์˜ ํ…์ŠคํŠธ๋ฅผ ๊ฐ€์ ธ์˜ค๊ธฐ
print(rank,title,star)

ํฌ๋กค๋งํ•œ ๊ณณ์— db ์—…๋กœ๋“œ ์ž‘์„ฑํ•˜๊ธฐ

  1. ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ฐ ๊ธฐ๋ณธ ์ฝ”๋“œ ๋„ฃ์–ด์ฃผ๊ธฐ
from pymongo import MongoClient
# mongoDB ํ™ˆํŽ˜์ด์ง€์—์„œ ๊ฐ€์ ธ์™€์•ผ ํ•˜๋Š” ๋ถ€๋ถ„ โœ… 
client = MongoClient('mongodb+srv://test:sparta@cluster0.qsaohhz.mongodb.net/?retryWrites=true&w=majority')  
db = client.dbsparta
  1. ๋ฐ‘์— pymongo ๋„ฃ๋Š” ์ฝ”๋“œ ์ž‘์„ฑ
doc = {
    'title' : title, 
    'rank' : rank,
    'star' : star
}

db.movies.insert_one(doc)
  • ์ฝ์–ด๋ณด๊ธฐ

์ด document ๋ฅผ ๋„ฃ์–ด์ค„๊ฑฐ์•ผ
์ด๊ฑด '๋”•์…”๋„ˆ๋ฆฌ' ํ˜•ํƒœ๋กœ ์จ์•ผ ํ•ด
value ์ธ title ๋ณ€์ˆ˜๋ฅผ ๋ชฝ๊ณ  db ์—์„œ title ๋กœ ์ €์žฅํ•˜๊ฒ ๋‹ค๋Š” ์˜๋ฏธ

db.movies.insert_one(doc) ์ฝœ๋ ‰์…˜ ์ด๋ฆ„์„ movies ๋กœ ํ•ด์„œ ๋„ฃ๋Š”๋‹ค! ์ข€ ๋” ์ƒˆ๋กญ๊ฒŒ

  • ๊ฒฐ๊ณผ๋ฌผ

์˜ค ๋“ค์–ด๊ฐ„๋‹ค
์‹ ๊ธฐํ•˜๋‹ค

728x90
๋ฐ˜์‘ํ˜•
Comments