자기계발/자격증

[빅데이터 분석기사] 실기 모의고사 2회 - 1/2

혁이e 2022. 6. 19.

모의고사 2회 공부 자료.
길가면서도 폰으로 항상 읽어보면서 공부할 것.

(1) 필답형
연관규칙 분석 / 매개변수(Parameter) / 준지도학습 / 데이터 레이크 / LOD (Linked Open Data)
ETL (Extract Transform Load) / K-평균 군집화 / 기술 통계 / 데이터 프로파일링 / 앙상블 기법

(2) 작업형
1. ISLR 패키지의 Carseat 데이터 세트에서 매출(Sales)의 이상값을 제외한 데이터를 훈련 데이터로 선정할 때, Age 의 표준편차를 구하시오(이상값은 평균보다 1.5 표준편차 이하거나 이상인 값이다)

# 데이터 로드 및 확인
> data(Carseats)
> ds <- Carseats
> str(ds)
'data.frame': 400 obs. of  11 variables:
$ Sales      : num  9.5 11.22 10.06 7.4 4.15 ...
$ CompPrice  : num  138 111 113 117 141 124 115 136 132 132 ...
$ Income     : num  73 48 35 100 64 113 105 81 110 113 ...
$ Advertising: num  11 16 10 4 3 13 0 15 0 0 ...
$ Population : num  276 260 269 466 340 501 45 425 108 131 ...
$ Price      : num  120 83 80 97 128 72 108 120 124 124 ...
$ ShelveLoc  : Factor w/ 3 levels "Bad","Good","Medium": 1 2 3 3 1 1 3 2 3 3 ...
$ Age        : num  42 65 59 55 38 78 71 67 76 76 ...
$ Education  : num  17 10 12 14 13 16 15 10 10 17 ...
$ Urban      : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 2 2 1 1 ...
$ US         : Factor w/ 2 levels "No","Yes": 2 2 2 2 1 2 1 2 1 2 ...
> head(ds)
  Sales CompPrice Income Advertising Population Price ShelveLoc Age Education Urban  US
1  9.50       138     73          11        276   120       Bad  42        17   Yes Yes
2 11.22       111     48          16        260    83      Good  65        10   Yes Yes
3 10.06       113     35          10        269    80    Medium  59        12   Yes Yes
4  7.40       117    100           4        466    97    Medium  55        14   Yes Yes
5  4.15       141     64           3        340   128       Bad  38        13   Yes  No
6 10.81       124    113          13        501    72       Bad  78        16    No Yes
> summary(ds)
     Sales          CompPrice       Income        Advertising       Population        Price        ShelveLoc
Min.   : 0.000   Min.   : 77   Min.   : 21.00   Min.   : 0.000   Min.   : 10.0   Min.   : 24.0   Bad   : 96
1st Qu.: 5.390   1st Qu.:115   1st Qu.: 42.75   1st Qu.: 0.000   1st Qu.:139.0   1st Qu.:100.0   Good  : 85
Median : 7.490   Median :125   Median : 69.00   Median : 5.000   Median :272.0   Median :117.0   Medium:219
Mean   : 7.496   Mean   :125   Mean   : 68.66   Mean   : 6.635   Mean   :264.8   Mean   :115.8
3rd Qu.: 9.320   3rd Qu.:135   3rd Qu.: 91.00   3rd Qu.:12.000   3rd Qu.:398.5   3rd Qu.:131.0
Max.   :16.270   Max.   :175   Max.   :120.00   Max.   :29.000   Max.   :509.0   Max.   :191.0
      Age          Education    Urban       US
Min.   :25.00   Min.   :10.0   No :118   No :142
1st Qu.:39.75   1st Qu.:12.0   Yes:282   Yes:258
Median :54.50   Median :14.0
Mean   :53.32   Mean   :13.9
3rd Qu.:66.00   3rd Qu.:16.0
Max.   :80.00   Max.   :18.0

# 결측치 확인
> colSums(is.na(ds))
      Sales   CompPrice      Income Advertising  Population       Price   ShelveLoc         Age   Education       Urban
          0           0           0           0           0           0           0           0           0           0
         US
          0
> sum(is.na(ds))
[1] 0

# 이상치 범위 정의
> mean = mean(ds$Sales)
> sd = sd(ds$Sales)
> max = mean + 1.5*sd
> min = mean - 1.5*sd

# 이상치 제외한 데이터셋 만들기
> ds2 <- ds %>% filter (Sales > min, Sales < max)

# Age 평균 구하기
> result <- sd(ds2$Age)
> print(result)
[1] 16.05213

2. MASS 패키지의 Car03 데이터 세트. Luggage.room 의 결측값을 중앙값으로 변환한 후 변환 전, 후 평균 차이 구하기
* MASS 패키지는 구름에 있으니까 구름 환경에서 실습

https://dataq.goorm.io/exam/116674/%EC%B2%B4%ED%97%98%ED%95%98%EA%B8%B0/quiz/2

구름EDU - 모두를 위한 맞춤형 IT교육

구름EDU는 모두를 위한 맞춤형 IT교육 플랫폼입니다. 개인/학교/기업 및 기관 별 최적화된 IT교육 솔루션을 경험해보세요. 기초부터 실무 프로그래밍 교육, 전국 초중고/대학교 온라인 강의, 기업/

edu.goorm.io

# 데이터 불러오기
library(MASS)
library(dplyr)
data(Cars93)
ds <- Cars93

# 데이터 확인 및 결측치 검사
str(ds)
head(ds)
colSums(is.na(ds))

# 중앙값 구하기
med <- median(ds$Luggage.room, na.rm=TRUE)
ds2 <- ds

# 결측치 제거한 데이터셋 구하기
ds2$Luggage.room <- ifelse(is.na(ds2$Luggage.room), med, ds2$Luggage.room )

# 평균 계산
mean1 <- mean(ds$Luggage.room, na.rm=TRUE)
mean2 <- mean(ds2$Luggage.room)

# 절대값씌워서 마무리
result <- mean1 - mean2
result_abs <- abs(result)
print(result_abs)

결과값
프로세스가 시작되었습니다.(입력값을 직접 입력해 주세요)
> 'data.frame': 93 obs. of  27 variables:
$ Manufacturer      : Factor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ...
$ Model             : Factor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ...
$ Type              : Factor w/ 6 levels "Compact","Large",..: 4 3 1 3 3 3 2 2 3 2 ...
$ Min.Price         : num  12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ...
$ Price             : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
$ Max.Price         : num  18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ...
$ MPG.city          : int  25 18 20 19 22 22 19 16 19 16 ...
$ MPG.highway       : int  31 25 26 26 30 31 28 25 27 25 ...
$ AirBags           : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
$ DriveTrain        : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
$ Cylinders         : Factor w/ 6 levels "3","4","5","6",..: 2 4 4 4 2 2 4 4 4 5 ...
$ EngineSize        : num  1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ...
$ Horsepower        : int  140 200 172 172 208 110 170 180 170 200 ...
$ RPM               : int  6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ...
$ Rev.per.mile      : int  2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ...
$ Man.trans.avail   : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
$ Fuel.tank.capacity: num  13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ...
$ Passengers        : int  5 5 5 6 4 6 6 6 5 6 ...
$ Length            : int  177 195 180 193 186 189 200 216 198 206 ...
$ Wheelbase         : int  102 115 102 106 109 105 111 116 108 114 ...
$ Width             : int  68 71 67 70 69 69 74 78 73 73 ...
$ Turn.circle       : int  37 38 37 37 39 41 42 45 41 43 ...
$ Rear.seat.room    : num  26.5 30 28 31 27 28 30.5 30.5 26.5 35 ...
$ Luggage.room      : int  11 15 14 17 13 16 17 21 14 18 ...
$ Weight            : int  2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ...
$ Origin            : Factor w/ 2 levels "USA","non-USA": 2 2 2 2 2 1 1 1 1 1 ...
$ Make              : Factor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ...
[1] 0.0129819

3. Covid19의 TimeAge 데이터 세트. 연령(age)이 20대(20s)인 확진자(confirmed)의 평균과 50대인 확진자 평균의 차이를 구하여라.

TimeAge.csv

0.03MB

# 데이터 불러오기
> a <- read.csv("c:/Data/TimeAge.csv")

# 데이터 확인 및 결측치 검사
> head(a)
        date time age confirmed deceased
1 2020-03-02    0  0s        32        0
2 2020-03-02    0 10s       169        0
3 2020-03-02    0 20s      1235        0
4 2020-03-02    0 30s       506        1
5 2020-03-02    0 40s       633        1
6 2020-03-02    0 50s       834        5
> str(a)
'data.frame': 1089 obs. of  5 variables:
$ date     : chr  "2020-03-02" "2020-03-02" "2020-03-02" "2020-03-02" ...
$ time     : int  0 0 0 0 0 0 0 0 0 0 ...
$ age      : chr  "0s" "10s" "20s" "30s" ...
$ confirmed: int  32 169 1235 506 633 834 530 192 81 34 ...
$ deceased : int  0 0 0 1 1 5 6 6 3 0 ...
> colSums(is.na(a))
     date      time       age confirmed  deceased
        0         0         0         0         0

# 20대, 50대의 데이터셋 구하기
> ds20s <- a %>% filter (a$age == "20s")
> ds50s <- a %>% filter (a$age == "50s")

# 각각의 평균 구하기
> mean20s <- mean(ds20s$confirmed)
> mean50s <- mean(ds50s$confirmed)

# 절대값 결과 출력
> result <- mean20s - mean50s
> result_abs <- abs(result)
> print(result_abs)
[1] 957

'자기계발 > 자격증' 카테고리의 다른 글

[빅데이터 분석기사] 실기 모의고사 3회 - 1/2 (R tool 사용) (0)	2022.06.21
[빅데이터 분석기사] 실기 모의고사 2회 - 2/2 (0)	2022.06.20
[빅데이터 분석기사] 실기 모의고사 1회 - 2/2 (0)	2022.06.18
[빅데이터 분석기사] 실기 모의고사 1회 - 1/2 (0)	2022.06.17
[빅데이터 분석기사] 2회 기출문제 연습(R 코드) - 3/3 (0)	2022.06.16

[빅데이터 분석기사] 실기 모의고사 2회 - 1/2

'자기계발 > 자격증' 카테고리의 다른 글

댓글

💲 추천 글

티스토리툴바

[빅데이터 분석기사] 실기 모의고사 2회 - 1/2

'자기계발 > 자격증' 카테고리의 다른 글

볼 만한 글

댓글

💲 추천 글

티스토리툴바