20260113 TIL

[데이터분석] 부트캠프 TIL

20260113 TIL

myun0506 2026. 1. 13. 21:25

Today I Learn

: SQL 코드카타, 수업날 복습, 문풀날 연습문제 / 심화문제

- SQL 코드카타

- 문제 1

1. 문제 링크: https://school.programmers.co.kr/learn/courses/30/lessons/157342

2. 정답 코드:

select 
    car_id,
    round(avg(datediff(end_date,start_date)+1),1) as average_duration
from car_rental_company_rental_history
group by car_id
having average_duration >= 7
order by average_duration desc, car_id desc

with days as (
select 
    car_id,
    datediff(end_date,start_date)+1 as duration
from car_rental_company_rental_history
)
select 
    car_id, 
    round(avg(duration),1) as average_duration
from days
group by car_id
having round(avg(duration),1) >= 7
order by average_duration desc, car_id desc

- 문제 2

1. 문제 링크: https://school.programmers.co.kr/learn/courses/30/lessons/77487

2. 정답 코드:

select id, name, host_id
from places 
where host_id in (
        select host_id 
        from places
        group by host_id
        having count(*) >= 2)
order by id

select id, name, host_id
from places pl
where exists (
    select 1
    from places p
    where p.host_id = pl.host_id
    group by host_id
    having count(*) >= 2)
order by id

with counted as (
    select id, name, host_id, count(*) over (partition by host_id) as cnt
    from places
)
select id, name, host_id 
from counted
where cnt >= 2
order by id

시간복잡도 최적화
- 윈도우함수를 사용함으로써 테이블을 두번 읽지 않음

- 문제 3

1. 문제 링크: https://school.programmers.co.kr/learn/courses/30/lessons/62284

2. 정답 코드:

select distinct cart_id 
from cart_products c
where exists (
    select 1
    from cart_products c2
    where c.cart_id = c2.cart_id
    and name = 'Milk') and
      exists (
    select 1
    from cart_products c3
    where c.cart_id = c3.cart_id
    and name = 'Yogurt')
order by cart_id

select cart_id 
from cart_products
where name in ('Milk','Yogurt')
group by cart_id
having count(distinct name) = 2
order by cart_id

where name in (...) 조건을 통해 필요한 행들만 먼저 골라낸 뒤 단 한번의 group by 연산으로 끝남
- I/O 비용을 대폭 줄임

- 문제 4

1. 문제 링크: https://school.programmers.co.kr/learn/courses/30/lessons/133027

2. 정답 코드:

with july_icecream as (
select flavor, sum(total_order) as total_order
from july 
group by flavor
)
select fh.flavor 
from first_half fh
join july_icecream ji
on fh.flavor = ji.flavor 
order by fh.total_order + ji.total_order desc
limit 3

first_half에 주문되었지만 july에 주문되지 않은 flavor이 있을 수도 있으니
- inner join 대신 left join 사용

with july_icecream as (
select flavor, sum(total_order) as total_order
from july 
group by flavor
)
select fh.flavor 
from first_half fh
left join july_icecream ji # july에 없는 flavor 까지 포함
on fh.flavor = ji.flavor 
order by fh.total_order + ifnull(ji.total_order,0) desc
limit 3

first_half에 없었는데 july에 생긴 flavor이라면
- left join을 했을 때 누락 될 수 있음.
- → 열 타입이 모두 같기 때문에 union all 을 사용하면 누락된 값 없이 모든 행을 조회할 수 있

select flavor
from (
    select flavor, total_order from first_half
    union all
    select flavor, total_order from july
) as combined
group by flavor
order by sum(total_order) desc
limit 3;

- 문제 5

1. 문제 링크: https://school.programmers.co.kr/learn/courses/30/lessons/131537

2. 정답 코드:

select *
from (
select date_format(sales_date,'%Y-%m-%d') as sales_date, product_id, user_id, sales_amount
from online_sale 
where sales_date >= '2022-03-01' and sales_date < '2022-04-01'
union all 
select date_format(sales_date,'%Y-%m-%d'), product_id, NULL, sales_amount
from offline_sale
where sales_date >= '2022-03-01' and sales_date < '2022-04-01'
) as combined
order by sales_date asc, product_id asc, user_id asc

그동안 썼던 where sales_date like '2022-03%' 은 date 타입일때 문자열 패턴 매칭 like 를 수행하며 효율적인 실행을 하지 못할 수 있음
- 따라서 범위를 정확히 지정해주는 것이 중요한데,
- between '2022-03-01' and '2022-03-31'을 할 경우, timestamp나 datetime 타입일 때 2022-03-31 00:00:00 까지만 포함하는 범위라서 완벽히 포괄하지 못함
- → '반열린 구간'을 통해 완벽하게 포괄할 수 있음!

- 문제 6

1. 문제 링크: https://leetcode.com/problems/recyclable-and-low-fat-products/description/

2. 정답 코드:

select product_id
from products 
where low_fats = 'Y' and recyclable = 'Y'

- 문제 7

1. 문제 링크: https://leetcode.com/problems/find-customer-referee/submissions/1883754448/

2. 정답 코드:

select name
from Customer
where referee_id != 2 or referee_id is null

- 문제 8

1. 문제 링크: https://leetcode.com/problems/big-countries/

2. 정답 코드:

select name, population, area
from World
where area >= 3000000 or population >= 25000000

- 문제 9

1. 문제 링크: https://leetcode.com/problems/article-views-i/

2. 정답 코드:

select distinct author_id as id
from Views
where author_id = viewer_id
order by id

- 문제 10

1. 문제 링크: https://leetcode.com/problems/invalid-tweets/submissions/1883760494/

2. 정답 코드:

select tweet_id
from Tweets
where length(content) > 15

- 문제 11

1. 문제 링크: https://leetcode.com/problems/not-boring-movies/description/

2. 정답 코드:

select id, movie, description, rating
from Cinema
where id % 2 = 1 and description != "boring"
order by rating desc

- 문제 12

1. 문제 링크: https://leetcode.com/problems/triangle-judgement/submissions/1883771646/

2. 정답 코드:

select 
    x, y, z, 
    case when x >= y+z or y >= x+z or z >= x+y then 'No'
    else 'Yes' end as triangle
from Triangle

- 문제 13

1. 문제 링크: https://leetcode.com/problems/fix-names-in-a-table/description/

2. 정답 코드:

select 
    user_id,
    concat(upper(substring(name,1,1)), lower(substring(name,2))) as name
from Users
order by user_id

3. 오류 상황: 문자열 합치는 함수는 'concat'인데 파이썬에서 string 합치듯이 '+' 사용해서 오류 발생!

4. 시도 방법: GEMINI 질문... upper과 lower 함수는 구글링!

substring 함수에서 마지막 인자 안적어주면 자동으로 '끝까지'

- 수업날 복습 (함수|모듈)

- 매개변수 vs 인자

매개변수(Parameter): 함수 정의할 떄 함수 내부로 전달받을 수 있는 변수
- 함수 선언부에 이름을 붙여놓은 것
인자(Argument): 함수를 호출할 떄 실제로 넘기는 값
- 호출 시 매개변수에 대응하는 실제 데이터를 인자로 전달

- 가변 인자(*args): 몇 개의 인자가 들어올지 모를 경우, args를 사용하면 인자를 튜플 형태로 받을 수 있음

def sum_all(*args):
    total = 0
    for num in args:
        total += num
    return total

print(sum_all(1, 2, 3))    # 6
print(sum_all(10, 20))     # 30
print(sum_all())           # 0

- 키워드 가변 인자 (**kwargs): 키워드 인자를 딕셔너리 형태로 받을 수 있음

def print_info(**kwargs):
    for key, value in kwargs.items():
        print(key, ":", value)

print_info(name="Eve", age=22, hobby="reading")
# name : Eve
# age : 22
# hobby : reading

- 람다 함수(Lambda Function): lambda 키워드를 사용하여 한 줄로 함수를 정의함

# 일반 함수
def add(a, b):
    return a + b

# 람다 함수
add_lambda = lambda a, b: a + b

print(add(3, 5))         # 8
print(add_lambda(3, 5))  # 8

- Docstring (문서 문자열): """ (삼중 따옴표) 로 작성하는 것으로, 함수 사용법과 의도를 기록함

- 지역 변수 vs 전역 변수:

함수 안에서 선언된 변수는 지역 변수(local)
함수 밖에서 선언된 변수는 전역 변수(global)

x = 10  # 전역 변수

def my_func():
    x = 5  # 지역 변수
    print("함수 내부:", x)

my_func()         # 함수 내부: 5
print("함수 외부:", x)  # 함수 외부: 10

- 실습 문제 2: 단어 빈도 세기 함수

def word_count(sentence):
    words = sentence.lower()
    words = words.split()
    cnt = {}
    for w in words:
      if w not in cnt:
        cnt[w] = 1
      else:
        cnt[w] += 1
    return cnt

print(word_count("Apple banana apple Orange orange banana apple"))

if-else 문을 get 함수를 사용하여 한줄로 간단히 할 수 있음!!!

def word_count(sentence):
    words = sentence.lower()
    words = words.split()
    cnt = {}
    for w in words:
      cnt[w] = cnt.get(w,0) + 1
    return cnt

- random 모듈

무작위(랜덤) 값을 만들어 주는 파이썬 내장 모듈
게임, 추첨, 시뮬레이션, 테스트 데이터 만들 때 사용함
random.randint(a,b) : a부터 b까지 정수 하나를 랜덤 반환
random.randrange(a,b) : a부터 b-1까지 정수 하나를 랜덤 반환
random.random() : 0 이상 1 미만의 실수 하나를 랜덤 반환
random.choice(리스트) : 리스트 안에서 하나를 랜덤으로 뽑아줌
random.shuffle(리스트) : 리스트의 순서를 랜덤으로 섞음
random.sample(리스트, 개수) : 리스트에서 중복 없이 여러개를 뽑음

- 문풀날 심화문제

- 심화 문제 1 : 온라인 강의 수강생 관리 시스템

students = {
    "민수": [80, 70, 90],
    "영희": [40, 55, 60],
    "지수": [100, 95, 90],
    "철수": [30, 45, 20]
}

def average(scores):
  try:
    res = sum(scores) / len(scores)
    return res
  except ZeroDivisionError:
    print("0으로 나눌 수 없습니다.")


def sixty(dctn):
  newdict = {}
  for student, score in dctn.items():
      if score >= 60:
          newdict[student] = score
  return newdict 

def highest(dctn):
  maxscore = 0
  maxstudent = ""
  for student in dctn:
      temp = dctn.get(student,0)
      if maxscore < temp:
          maxstudent = student
          maxscore = temp
  return maxstudent

def highest2(dctn):
  return max(dctn, key=dctn.get)

avgdict = {student:round(average(students[student]),2) for student in students}


print(f"학생별 평균: {avgdict}")

score_sixty = sixty(avgdict)
print(f"합격자: {score_sixty}")

highest_student = highest(avgdict)
print(f"최고 평균 학생: {highest_student}")

highest2 : for-loop / if문 사용 없이 max 함수로 한줄 작성 (key = dctn.get)

- 심화 문제 2 : 은행 계좌 시뮬레이터

# 문제 2
accounts = {
    "A001": 50000,
    "A002": 120000,
    "A003": 30000
}

def deposit(accdict, acc, money):
    if acc in accdict:
      accdict[acc] += money
      print(f"{acc} 잔액: {accdict.get(acc)}")
  
def withdrawal(accdict, acc, money):
    if acc in accdict:
      temp = accdict.get(acc,0)
      if money > temp:
          print(f"{acc} 출금 실패")
      else:
          accdict[acc] -= money
          print(f"{acc} 잔액: {accdict.get(acc)}")

def whole(accdict):
    return sum(accdict.values())
  
def danger(accdict):
    dndict = [acc for acc, money in accdict.items() if money < 50000]
    return dndict
  
deposit(accounts,"A001",20000)
withdrawal(accounts, "A003", 50000)
res = whole(accounts)
print(f"전체 은행 자산: {res}")
res = danger(accounts)
print(f"위험 계좌: {res}")

whole : for-loop/if 문 사용 없이 sum() 함수와 dict.values() 함수 사용하여 한줄로 작성

- 심화 문제 3

# 문제 3
log = "error login success login error logout success login error"

from collections import defaultdict

def frequency(s):
  s = s.lower()
  words = s.split()
  freqdict = defaultdict(int)
  for w in words:
      freqdict[w] += 1
  return freqdict

def mostfreq(freqdict):
  freq = 0
  freqword = ""
  for w, c in freqdict.items():
    if c > freq:
      freq = c
      freqword = w
  return freqword

def mostfreq2(freqdict):
    return max(freqdict, key=freqdict.get)

def calculate_ratio(freqdict, word):
    total = sum(freqdict.values())
    if total == 0: return -1

    freq = freqdict.get(word,0)
    return round(freq / total * 100,2)

def twotimes(freqdict):
    newdict = {w:c for w,c in freqdict.items() if c>=2}
    return newdict

result = dict(frequency(log))
print(f"단어 빈도: {result}")
result1 = mostfreq(result)
print(f"최다 단어: {result1}")
result2 = calculate_ratio(result,result1)
if result2 == -1:
          print("0으로 나눌 수 없습니다.")
else:
          print(f"{result1} 비율: {result2}%")
result3 = twotimes(result)
print(f"중요 단어: {result3}")

mostfreq2 : for-loop / if 문 사용 대신 max() 함수 사용하여 한줄로 작성 (key=freqdict.get)
calculate_ratio : 경우를 try-except 대신 더 직관적으로 나눔 (if문)

- 즐거운 튜터링 시간~~~