Everyday millions of traders around the world aim to trade stocks to make money. However, stock trading has never been easy. Stock prices depend on multiple factors and it is very difficult to develop a good strategy and make decisions such as when to buy? when to hold? and when to sell? Lately, Deep Reinforcement learning (DRL) agents have proven to show great promise in games such as Chess and Go. However, it remains a big challenge to design a profitable strategy in a complex and dynamic stock market. This paper explores the application of Trust Region Policy Optimization (TRPO), a policy gradient method,to learn an optimal strategy for high portfolio automated stock trading. We model the stock trading process as a Markov Decision Process (MDP) and then formulate our trading goal as a maximization problem. This agent learns to automatically position itself to win the market, specifically, it decides where to trade, at what price, and what quantity. We also train three other reinforcement learning algorithms, Soft Actor Critic (SAC), Proximal Policy Optimization(PPO), and Twin Delayed DDPG (TD3) to serve as baselines to compare their performances. In this experiment, we choose the Dow Jones Industrial Average (DJIA) 30 constituent stocks as they are the most popular stocks for portfolio allocation. We demonstrate the credibility and advantages of TRPO in financial markets for strategic decision making in portfolio allocation.