Efficient and scalable radio resource allocation is essential for the success of wireless cellular networks. This paper presents a fully scalable multi-agent reinforcement learning (MARL) framework, where each agent manages spectrum, power allocation, and scheduling within a cell, using only locally available information. The objective is to minimize packet delays under stochastic traffic arrivals, applicable to both conflict graph models and cellular network configurations. This is formulated as a distributed learning problem and implemented using a multi-agent proximal policy optimization (MAPPO) algorithm. This traffic-driven MARL approach enables fully decentralized training and execution, ensuring scalability to arbitrarily large networks. Extensive simulations demonstrate that the proposed methods achieve quality of service (QoS) performance comparable to centralized algorithms that require global information, while the trained policies show robust scalability across diverse network sizes and traffic conditions.