1. Creating a data frame
 students = data.frame (
 hours=c(2,3,5,6,8,10,12), 
 score=c(50,55,65,70,75,85,90)
 )
 
 str(students)

‘data.frame’: 7 obs. of 2 variables: score: num 50 55 65 70 75 85 90’data.frame’: 7 obs. of 2 variables: score: num 50 55 65 70 75 85 90

  1. Summarize the dataframe
summary(students)
     hours            score
 Min.   : 2.000   Min.   :50
 1st Qu.: 4.000   1st Qu.:60
 Median : 6.000   Median :70
 Mean   : 6.571   Mean   :70
 3rd Qu.: 9.000   3rd Qu.:80
 Max.   :12.000   Max.   :90
  1. To get first n records / last n records
# get first 2 records
head(students, 2)
# get last 3 records
tail(students, 3)
  1. To get the 3rd and 5th row , for column 1 and 2
students[c(3,5), c(1,2)]
 	hours score
 3     5    65
 5     8    75
  1. plotting graphs
plot(students$hours, students$score, main="Study hours vs Exam score",
xlab = "hours studied",
ylab="exam score",
pch=20)

barplot(students$hours, students$score, main="hours studied vs exam score", xlab="hours studied", ylab="exam score")

Exercise

find.package('readxl')
library(readxl)
data_file = file.choose()
student_marks = readxl(data_file)
student_marks
# A tibble: 100 × 5
   RollNo Maths History Physics  Arts
    <dbl> <dbl>   <dbl>   <dbl> <dbl>
 1      1    47       9      57    51
 2      2    21      78      25    62
 3      3    70      45      80    28
 4      4    66      42      68    22
 5      5     7      35      12    87
 6      6    69      29      77    14
 7      7    32      74      34    48
 8      8    73       3      81    20
 9      9    62      80      71    24
10     10    68      41      78    14
# ℹ 90 more rows
# ℹ Use `print(n = ...)` to see more rows
str(student_marks)
tibble [100 × 5] (S3: tbl_df/tbl/data.frame)
 $ RollNo : num [1:100] 1 2 3 4 5 6 7 8 9 10 ...
 $ Maths  : num [1:100] 47 21 70 66 7 69 32 73 62 68 ...
 $ History: num [1:100] 9 78 45 42 35 29 74 3 80 41 ...
 $ Physics: num [1:100] 57 25 80 68 12 77 34 81 71 78 ...
 $ Arts   : num [1:100] 51 62 28 22 87 14 48 20 24 14 ...

plotting graph

a. Math vs Physics

plot(student_marks$Maths, student_marks$Physics, main="Maths vs Physics", xlab="maths marks", ylab="physics marks")

cor(student_marks$Maths, student_marks$Physics)

[1] 0.9911593

Thus, High Correlation as Correlation coefficient is close to 1

b. Math vs Arts

plot(student_marks$Maths, student_marks$Arts, main="Maths vs Arts", xlab="maths marks", ylab="Arts marks")

cor(student_marks$Maths, student_marks$Arts)

-0.9648045

Negative correlation, as Coefficient of Correlation is close to -1.

c. Math vs Arts

plot(student_marks$Maths, student_marks$History, main="Maths vs History")

cor(student_marks$Maths, student_marks$History)

[1] 0.01833364

thus, No correlation as correlation coefficient is close to 0.

Prediction models

for physics

physics_model = lm
(Physics~Maths, data=student_marks)
summary(physics_model)
predict(physics_model, data.frame(Maths=c(47,21,70)))
  1        2        3
53.19789 26.62476 76.70490 
   RollNo Maths History Physics  Arts
 1      1    47       9      57    51
 2      2    21      78      25    62
 3      3    70      45      80    28

for history

history_model = lm(History~Maths, data=student_marks)
 
summary(history_model)
 
predict(history_model, data.frame(Maths=c(47,21,70)))
  1        2        3
41.35981 40.85491 41.80646
  • no correlation, thus the predictions don’t make sense
   RollNo Maths History Physics  Arts
 1      1    47       9      57    51
 2      2    21      78      25    62
 3      3    70      45      80    28