Penetration testing is a security exercise aimed at assessing the security of a system by simulating attacks against it. So far, penetration testing has been carried out mainly by trained human attackers and its success critically depended on the available expertise. Automating this practice constitutes a non-trivial problem, as the range of actions that a human expert may attempts against a system and the range of knowledge she relies on to take her decisions are hard to capture. In this paper, we focus our attention on simplified penetration testing problems expressed in the form of capture the flag hacking challenges, and we apply reinforcement learning algorithms to try to solve them. In modeling these capture the flag competitions as reinforcement learning problems we highlight the specific challenges that characterize penetration testing. We observe these challenges experimentally across a set of varied simulations, and we study how different reinforcement learning techniques may help us addressing these challenges. In this way we show the feasibility of tackling penetration testing using reinforcement learning, and we highlight the challenges that must be taken into consideration, and possible directions to solve them.