20260115 TIL

[데이터분석] 부트캠프 TIL

20260115 TIL

myun0506 2026. 1. 15. 21:04

Today I Learn

: SQL 코드카타, 파이썬 연습문제

- SQL 코드카타

- 문제 1 : Leetcode #1661 Average Timeof Process per Machine

1. 문제 링크: https://leetcode.com/problems/average-time-of-process-per-machine/

2. 정답 코드:

with selected as (
select machine_id, max(timestamp)-min(timestamp) as diff
from activity
group by machine_id, process_id
) 
select machine_id, round(avg(diff),3) as processing_time
from selected 
group by machine_id

group by 두번 사용
- ↓ case when 을 사용하여 group by 사용을 한번으로 줄임
- case when 에서 start timestamp에 - 부호를 붙여 바로 sum 할 수 있게끔 변환

select
    machine_id, 
    round(sum(case activity_type 
        	when 'start' then -(timestamp)
        	when 'end' then timestamp
        	end) / count(distinct process_id),3) as processing_time
from activity
group by machine_id

- 문제 2 : Leetcode #577. Employee Bonus

1. 문제 링크: https://leetcode.com/problems/employee-bonus/submissions/1885334938/

2. 정답 코드:

select e.name, b.bonus
from employee e 
left join bonus b 
on e.empid = b.empid
where b.bonus < 1000
or b.bonus is null

- 문제 3 : Leetcode #1075. Project Employees I

1. 문제 링크: https://leetcode.com/problems/project-employees-i/

2. 정답 코드:

select  
    p.project_id, 
    round(avg(e.experience_years),2) as average_years
from project p 
join employee e 
on p.employee_id = e.employee_id
group by p.project_id

- 문제 4 : Leetcode #1633. Percentage of Users Attended a Contest

1. 문제 링크: https://leetcode.com/problems/percentage-of-users-attended-a-contest/description/

2. 정답 코드:

select 
    r.contest_id, 
    round(count(distinct r.user_id)/(select count(*) as cnt from users)*100,2)
        as percentage
from register r 
group by r.contest_id
order by percentage desc, r.contest_id asc

스칼라 서브쿼리
- 하나의 행과 하나의 열(단일 값)만 반환하는 서브쿼리
- 이론적으로는 매 행마다 실행되는 것이 맞지만,
  - 현대의 똑똑한 SQL Optimizer들은 내부적으로 최적화를 수행함
  - Caching 캐싱 : 서브쿼리가 메인 쿼리의 컬럼을 참조하지 않는(비상관 서브쿼리)경우, 엔진은 이 값을 딱 한번만 계산해서 메모리에 저장(Cache)해두고 재사용 함

select 
    r.contest_id, 
    round(count(distinct r.user_id)/u.cnt*100,2)
        as percentage
from 
    register r,
    (select count(*) as cnt from users) as u
group by r.contest_id
order by percentage desc, r.contest_id asc

cross join
- 모든 행에 전체 유저수(u.cnt)를 결합
- DB 엔진은 이를 실제로 물리적 공간에 다 복사해두기보다는
  - 메모리상에서 포인터만 연결하는 방식으로 처리함!
  - 오히려 옵티마이저 입장에서는 cross join 구조에서 '아, 이 값은 고정된 상수구나'라고 판단하기가 더 명확할 때가 있음.

with total as (select count(*) as cnt from users)
select 
    r.contest_id, 
    round(count(distinct r.user_id)/t.cnt*100,2)
        as percentage
from 
    register r,
    total t
group by r.contest_id
order by percentage desc, r.contest_id asc

CTE를 사용하면,
- total CTE가 먼저 계산된다는 흐름이 명확해지고,
- 쿼리문 안에서 t.cnt를 여러 번 재사용하기도 훨씬 편해짐

- 문제 5 : Leetcode #1211. Queries Quality and Percentage

1. 문제 링크: https://leetcode.com/problems/queries-quality-and-percentage/description/

2. 정답 코드:

with poor as ( 
    select query_name, count(*) as cnt 
    from queries q2
    where rating < 3 
    group by query_name
) 
select 
    q.query_name,
    round(avg(q.rating/q.position),2) as quality,
    round(p.cnt/count(*)*100,2) as poor_query_percentage
from queries q
join poor p
on q.query_name = p.query_name
group by query_name

select 
    q.query_name,
    round(avg(q.rating/q.position),2) as quality,
    round(
        (select count(*) as cnt 
        from queries q2
        where rating < 3 and q2.query_name = q.query_name
        group by query_name)
        /count(*)*100,2) as poor_query_percentage
from queries q
group by query_name

(select count(*) ... where q2.query_name = q.query_name) 부분은 메인 쿼리의 query_name과 연결되어 정확한 개수를 가져옴
- 논리적 허점: 만약 어떤 query_name 에 대해 rating < 3인 데이터가 하나도 없다면????
  - 서브 쿼리는 아무런 행도 반환하지 않아 NULL이 됨
  - 결국 NULL / count(*) 의 결과는 NULL 이 되어, 0%가 아닌 비어있는 값이 출력될 위험이 있음.
- 해결책: 이를 방지하려면 서브쿼리 바깥에 IFNULL(...,0) 처리를 해주어야 함
더 효율적인 방법: CASE WHEN 또는 비교 연산의 집계
- 이 문제처럼 '특정 조건에 맞는 데이터의 비율'을 구할 대는 서브쿼리나 JOIN 없이 단 한번의 스캔으로 끝내는 방법이 가장 좋음
- 핵심 아이디어: 불리언(Boolean) 값의 평균
  - SQL에서 rating<3 이라는 조건은 참이면 1, 거짓이면 0으로 변환하여 다룰 수 있음
  - 이를 평균(AVG) 내거나 합계(SUM) 내면 서브쿼리 없이도 비율이 나옴

select 
    query_name,
    round(avg(rating/position),2) as quality,
    round(
        sum(case when rating < 3 then 1 else 0 end)
        /count(*)*100,2) as poor_query_percentage
from queries
group by query_name

sum(case when...) / count(*) 이 의미하는건 결국 avg(case when ...)이 됨
- 1과 0으로 이루어진 집합의 평균을 내면,
  - 자연스럽게 '전체 중 1이 차지하는 비율'이 나오기 때문임

select 
    query_name,
    round(avg(rating/position),2) as quality,
    round(
        avg(case when rating < 3 then 1 else 0 end)*100,2) as poor_query_percentage
from queries
group by query_name

- 파이썬 수업날 연습문제 (클래스)

- 연습문제 1

class Student:
  def __init__(self, name, age):
    if not isinstance(age, int):
      raise TypeError("나이는 정수여야 합니다.")
    self.name = name
    self.age = age

  def print_info(self):
    print(f"이름: {self.name}, 나이: {self.age}")

minsu = Student("민수",15)
minsu.print_info()

- 연습문제 2

class Rectangle:
  def __init__(self, width, height):
    if not (isinstance(width, int) and isinstance(height, int)):
      raise TypeError("가로와 세로는 정수여야 합니다.")

    if width <= 0 or height <= 0:
      raise ValueError

    self.__width = width
    self.__height = height

  def get_area(self):
    return self.__width * self.__height

rect = Rectangle(5,3)
print(f"넓이 : {rect.get_area()}")

- 연습문제 3

class Dog:
    def __init__(self, name: str, age: int):
      if not isinstance(age,int):
        raise TypeError("나이는 정수만 입력해야합니다.")
      if age <= 0:
        raise ValueError(f"나이는 0보다 커야합니다. 입력값: {age}")

      self.name = name
      self.age = age

dog = Dog("초코", 3)
print(dog.name)

- 연습문제 4

class Counter:
    count = 0
    def add(self):
        self.count += 1

a = Counter()
b = Counter()

a.add()
a.add()
b.add()

print(a.count) # 2
print(b.count) # 1

초기 상태:
- a와 b인스턴스 내부에는 count라는 변수가 없음
- a.count를 호출하면 파이썬은 인스턴스를 먼저 뒤지고,
- 없으면 클래스 네임스페이스로 올라가서 공유 변수인 count(0)을 찾아냄
self.count += 1 (self.count = self.count + 1)
- 우변의 self.count는 클래스 변수 0을 참조하고 거기에 1을 더함
- 좌변의 self.count = ...이 실행되는 순간,
  a라는 인스턴스 네임 스페이스에 count라는 새로운 인스턴스 변수가 생성되어 할당됨
결론적으로 a와 b는 더 이상 클래스 변수를 바라보지 않고, 각자 자기만의 count를 가지게 된 것

- 연습문제 5

class Box:
    def __init__(self, items):
        self.items = items

box1 = Box([1, 2, 3])
box2 = box1 # 여기서 box2가 box1과 같은 주소를 가리키게 됨

box2.items.append(4) 
"""같은 주소를 공유하므로 
box2에 append를 해도 
같이 공유하는 리스트에 append 됨"""

print(box1.items) # [1,2,3,4]

파이썬에서 리스트나 딕셔너리 같은 가변 객체를 다룰 때,
외부에서 들어온 인자를 그대로 self.items = items.로 할당하는 것은 굉장히 위험함!
외부에서 리스트를 수정하더라도 클래스 내부 데이터가 보호되게 하려면,
할당 시점에 items[:]나 list(items), copy.deepcopy()와 같은 '방어적 복사' 방법을 사용해야 함

- 연습문제 6

class Score:
    total = 0

    def __init__(self, value):
        self.value = value # value는 인스턴스 변수
        Score.total += value # total은 클래스 변수

    def add(self, x):
        self.value += x 
        Score.total += x

a = Score(10) # a.value = 10, Score.total = 10
b = Score(20) # b.value = 20, Score.total = 30

a.add(5) # a.value = 15, Score.total = 35
b.add(3) # b.value = 23, Score.total = 38

print(a.value) # 15
print(b.value) # 23
print(Score.total) # 38

- 연습문제 7

class Student:
    scores = []

    def add_score(self, score):
        self.scores.append(score)

s1 = Student()
s2 = Student()

s1.add_score(90) # s1.scores = [90]
s2.add_score(80) # s2.scores = [90, 80]

print(s1.scores)
print(s2.scores)

초기 상태: scores는 인스턴스 변수가 안리ㅏ 클래스 변수
add_score 메소드:
- s1에서 클래스 변수 scores에 접근하여 그 주소 리스트에 append를 실행함
- s2에서도 인스턴스 변수가 없으니까 결국 클래스 변수 scores에 접근해서 같은 주소값에 append를 실행함
- 결국 s2.scores = [90, 80]이 출력됨

class Student:
    def __init__(self, initial_scores=None):
      if initial_scores is None:
        self.scores = []
      else:
        self.scores = list(initial_scores)

    def add_score(self, score):
        self.scores.append(score)

s1 = Student([90,80,70])
s2 = Student()

s1.add_score(90) # s1.scores = [90, 80, 70, 90]
s2.add_score(80) # s2.scores = [80]

print(s1.scores)
print(s2.scores)

만약, def __init__(self, scores=[])라고 적었다면?
- []는 메모리상에 단 하나만 생성되어 모든 인스턴스가 공유하게 됨!!!!!
파이썬 인터프리터는 기본 인자로 선언된 리스트를 딱 하나만 만들어두고,
인자가 넘어오지 않을 때마다 그 동일한 주소를 재사용함
따라서, 불변 객체인 None을 기본 값으로 두고,
실행 시점에 인스턴스별로 새로운 리스트를 생성하는 것이 파이썬의 표준임