1. 취지

요즘 숨고 의뢰를 받다보면 다양한 주제가 인입됩니다. 우선, 저는 개발/레슨 항목으로 등록해두었으며, 관련해서 계속 알림을 받고 있어요. 많은 의뢰중 하나가 python을 이용하여 Trend keyword를 알아보는 의뢰였는데요. 저도 의뢰를 해결하다가 아래와 같이 문득 궁금증이 들었습니다.

인입되는 숨고의 요청서에 자주 사용되는 키워드가 뭔지?
어떤 내용을 주로 요청하는지?

2. 숨고 API 분석

숨고에 로그인을 하고 견적요청서 메뉴를 클릭하면 요청서 정보를 받아오기 위하여 숨고API를 호출합니다. python을 이용하여 해당 API로 요청서 정보를 받아오도록 하겠습니다.

크롬 개발자 도구롸 확인한 숨고API 주소

Python 코드

headers = {
     'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36',
     'Authorization': ~Input your own auth key~
 }
 url = "https://api.soomgo.com/v1/me/requests/received"
 res = requests.get( url, headers=headers)
 res = res.json()

3. 요청서 분석

2번 단계에서 API를 통해 전달받은 contents를 word 단위로 파싱을 해줍니다. Okt() 모듈을 활용한다면 조사,접속사 등 다양하게 파싱을 해주지만, 저는 간단하게 1) 특수문자 제거, 2) 띄어쓰기로 단어 구분을 하였습니다.

Python 코드

output = ", ".join(content_list)
# 특수문자 제거
special_chr = ["‘", "’", "!",  ",", ".", "(", ")", "?", "&", "'", "\”", "\"", "~"]
etc = ["을", "를", "와", "으로", "하는"]
for char in special_chr+etc:
    output = output.replace( char, " " )
print( output )
temp = output.split( " " )
for i in temp:
    if i != "":
        results.append( i )

4. 최종 결과

사용도 순으로 나열된 단어들

Count: 13, Keyword: 싶습니다
Count: 10, Keyword: 배우고
Count: 8, Keyword: 수
Count: 7, Keyword: 도
Count: 6, Keyword: 싶어요
Count: 5, Keyword: 파이썬
Count: 5, Keyword: 코딩
Count: 4, Keyword: 등
Count: 3, Keyword: 희망
Count: 3, Keyword: 제작
...

이미지로 만들어본 Trend Keywords

전체 소스코드

from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
from wordcloud import WordCloud
from konlpy.tag import Okt
from collections import Counter
import matplotlib.pyplot as plt

def get_soomgo_data():
    headers = {
        'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36',
        'Authorization': ~Input your own auth key~
    }
    after_page = 0
    results = []
    #while True
    for kk in range(20):
        print(after_page)
        if after_page == -1:
            break
        elif after_page == 0:
            url = "https://api.soomgo.com/v1/me/requests/received"
        else:
            url = "https://api.soomgo.com/v1/me/requests/received?after={0}".format(after_page)

        res = requests.get( url, headers=headers)
        res = res.json()
        response = res["response"]
        after_page = response['pagination']['after']
        items = response['items']
        content_list = []
        for _item in items:
            _id = _item['request']['id']
            _content =  _item['request'][ 'requestContent']
            _content = _content.split(", ")[-1]
            content_list.append( _content )

        output = ", ".join(content_list)
        # 특수문자 제거
        special_chr = ["‘", "’", "!",  ",", ".", "(", ")", "?", "&", "'", "\”", "\"", "~"]
        etc = ["을", "를", "와", "으로", "하는"]
        for char in special_chr+etc:
            output = output.replace( char, " " )
        print( output )
        temp = output.split( " " )
        for i in temp:
            if i != "":
                results.append( i )

    return results

results = get_soomgo_data()

# 가장 많이 나온 단어부터 50개를 저장한다.
counts = Counter(results)
tags = counts.most_common(50)
for tag in tags:
    print( u"Count: {0}, Keyword: {1}".format(tag[1], tag[0]))

_path = r'/Users/user/Downloads/NanumGothicCoding-2.5/NanumGothicCoding.ttf'
wc = WordCloud(font_path=_path, background_color="white", max_font_size=60)
cloud = wc.generate_from_frequencies(dict(tags))

# 생성된 WordCloud를 test.jpg로 보낸다.
#cloud.to_file('test.jpg')

plt.figure(figsize=(10, 8))
plt.axis('off')
plt.imshow(cloud)
plt.show()

'Computer Languages > Python' 카테고리의 다른 글

Python에서 한글 입력 시 오류해결 방법 (1)	2022.08.03
Python에 Matplotlib 설치하는 방법 (0)	2022.06.21
Python에서 zlib 압축 푸는 방법 (2)	2016.01.25
Python에서 MySQL 연결하는 방법 (2)	2015.11.09
pydbg설치 참고 사이트 (0)	2014.11.29

Hello, Stranger

숨고 요청서에 작성된 단어 분석하기

1. 취지

2. 숨고 API 분석

3. 요청서 분석

4. 최종 결과

'Computer Languages > Python' 카테고리의 다른 글

티스토리툴바

숨고 요청서에 작성된 단어 분석하기

1. 취지

2. 숨고 API 분석

3. 요청서 분석

4. 최종 결과

'Computer Languages > Python' 카테고리의 다른 글

관련글

티스토리툴바