Gaussian Processes (GP) are widely used to model spatial dependency in geostatistical data, yet the exact Bayesian inference has an intractable time complexity of $O(n^3)$. Vecchia approximation has become a popular solution to this computational issue, where spatial dependency is characterized by a sparse directed acyclic graph (DAG) that allows scalable Bayesian inference. Despite the popularity in practice, little is understood about its theoretical properties. In this paper, we systematically study the posterior contraction rates of Vecchia approximations of GP. Under minimal regularity conditions, we prove that by appropriate selection of the underlying DAG, the Vecchia approximated GP possess the same posterior contraction rates as the mother GP. Therefore, by optimal choices of the tunning hyper-parameters, the Vecchia approximation can achieve the minimax contraction rate, providing strong frequentist guarantees to the procedure. Our theoretical findings are demonstrated numerically as well using synthetic and real world data sets.