Books dataset with Quarto
Investigating the books dataset within quarto
Loading the required packages
library(ggplot2)
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Loading the packages
books <- read.csv("../../../../data/books.csv")
The books dataset includes the following fields
[1] "X" "bookID" "title"
[4] "authors" "average_rating" "isbn"
[7] "isbn13" "language_code" "num_pages"
[10] "ratings_count" "text_reviews_count" "publication_date"
[13] "publisher"
This graph shows the the volume of publications by language code
books |>
ggplot(mapping = aes(x = language_code)) +
geom_bar()
Due to the large number of english language books, We will create a data subset without the codes ‘eng’ and ‘en-US’.
non_english_books <- filter(books,!language_code %in% c("en-US","eng"))
non_english_books |>
ggplot(mapping = aes(x = language_code)) +
geom_bar()
#| fig-cap: "Number of Non-English Books Published"
ggplot(data = non_english_books,
mapping = aes(x = num_pages,
y = publisher)) +
geom_point()