An R package to retrieve Vietnam stock data

Posted by Hai Vo on Friday, August 6, 2021

Update 10-2021

I just update the function get_cefeF2() which using async request (from {crul} package). The new function seem to be way faster compare with the old get_cafeF().

# New
system.time(vnstockr::get_cafeF2("ACB", "1/1/2020", "1/1/2021"))
##    user  system elapsed 
##   1.246   0.013   1.832
# Old
system.time(vnstockr::get_cafeF("ACB", "1/1/2020", "1/1/2021"))
##    user  system elapsed 
##   1.162   0.146  14.703

Why

It is convenience to import stock data directly into R because there are multiple advance tools in R allow you to analyzing or run machine learning model with time series such as {modeltime} {fable}.

The{tidyquant} package made it simple by let user get stock data in Yahoo Finance to R in the “tidy” format. However, most of Vietnamese company does not listed in Yahoo Finance, hence user must manually download the data and import it into R before take advantage of R packages.

To automate this boring task, I write a simple package that allow user to get the Vietnam stock data from cafef.vn and vndirect.com.vn. This package is the wrap around of {httr} package which send request to these two data sources and parse the respond to R data frame (tibble).

You can browse the package source code here

Install

To install this package, use:

devtools::install_github("https://github.com/vohai611/vnstockr")

Usage

This package only contain two function get_cafeF() and get_vndirect(). User just need to specify stock code (symbol) and time frame. For example:

library(vnstockr)
acb <- get_vndirect('ACB', '1/5/2020', '1/5/2021')
acb |>
  head(5) 
## # A tibble: 5 × 25
##   code  date       time     floor type  basicPrice ceilingPrice floorPrice  open
##   <chr> <date>     <chr>    <chr> <chr>      <dbl>        <dbl>      <dbl> <dbl>
## 1 ACB   2021-04-29 15:04:05 HOSE  STOCK       33.8         36.2       31.4  33.8
## 2 ACB   2021-04-28 15:04:02 HOSE  STOCK       34           36.4       31.6  33.5
## 3 ACB   2021-04-27 15:04:02 HOSE  STOCK       33.3         35.6       31    33  
## 4 ACB   2021-04-26 15:04:04 HOSE  STOCK       33.4         35.7       31.1  33.4
## 5 ACB   2021-04-23 15:04:04 HOSE  STOCK       32.5         34.8       30.2  32.4
## # … with 16 more variables: high <dbl>, low <dbl>, close <dbl>, average <dbl>,
## #   adOpen <dbl>, adHigh <dbl>, adLow <dbl>, adClose <dbl>, adAverage <dbl>,
## #   nmVolume <dbl>, nmValue <dbl>, ptVolume <dbl>, ptValue <dbl>, change <dbl>,
## #   adChange <dbl>, pctChange <dbl>

Data retrieve are in the same “tidy” format as {tidyquant} package and could easily integrate with other R package quickly. For example, data could be visualized with ggplot2:

library(ggplot2)
library(plotly)
acb_plot <- acb |>
  ggplot(aes(date, close))+
  geom_line()+
  scale_x_date(date_breaks = "4 weeks")+
  theme_light()+
  theme(axis.text.x = element_text(angle = 45, hjust =1))+
  expand_limits(y = 0)+
  labs(title = "ACB stock price from 1/5/2020 to 1/5/2021\n", x= NULL)

ggplotly(acb_plot) |>
  config(displayModeBar = FALSE)

Note

get_cafeF() have to send multiple requests to cafef.vn, depend on the time frame user specified. Although I use {furrr} to send request in parallel but the the speed is much slower compare with get_vndirect() which is only send one request.