In this project, we explore how a team of diverse robots can collaboratively monitor complex environments, such as bustling seaports or major city events. Equipped with varied sensors like cameras and microphones, each robot gathers data from its unique perspective. While some advancements exist in robots reaching consensus on basic features, integrating high-level reasoning with diverse sensing remains a challenge. Our goal is to bridge this gap, utilizing advanced control and recognition techniques. Through our algorithms, robots will not only identify key elements and events but also optimize data acquisition and inter-robot communication, ensuring a consistent scene understanding across the team, all while relying on local sensing and minimal robot-to-robot communications.