Multi-arm Multi-player bandit for wireless Ad-Hoc Networks


S SATAPATHY, S SINGH, SJ DARAK

Abstract:

Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems. Driven by such applications as well, we motivate the introduction of several levels of feedback for multi-player MAB algorithms. Most existing work assume that sensing information is available to the algorithm. Under this assumption, we improve the state-of-the-art lower bound for the regret of any decentralized algorithms and introduce two algorithms, MCTopM-Static and MCTopM - Dynamic, that are shown to empirically outperform existing algorithms. Moreover, we provide strong theoretical guarantees for these algorithms, including a notion of asymptotic optimality in terms of the number of selections of bad arms. We then We took variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from a set of arms and then resolved many drawbacks like collision, prior knowledge of users, quasi stationary reward distribution (change detection), etc.

Key terms:

Multi-Armed Bandits; Decentralized algorithms; Reinforcement learning; Cognitive Radio; Opportunistic Spectrum Access.

What are we upto ?

  • We took variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from a set of arms and then resolved many drawbacks like collision, prior knowledge of users, quasi stationary reward distribution (change detection), etc
  • By connecting change detection techniques with classic UCB algorithms, we proposed a learning algorithm, which can detect and adapt to changes, for the considered scenario.
  • Furthermore, we considered two variants in this work - a static and a dynamic setting, in which players may enter and exit throughout the game.
  • Presented some simulation results to numerically evaluate the performance of our algorithm.
  • To the best of our knowledge, these are the first...... Continue Reading.
  • About my co-authors:

    Shivani Singh

    Department of Computer Engineering, IIIT-Bhubaneswar, India

    Shivani Singh is currently in her 3rd year of the BTech degree in computer engineering IIIT Bhubaneswar, India. Her research interests include algorithm design for next generation advanced wireless networks and Cyber physical IoTs.

    Sumit J Darak

    Department of Electronics and Telecommunications Engineering, Indraprastha Institute of Information Technology, Delhi, India.

    He is currently an Assistant Professor at Indraprastha Institute of Information Technology, Delhi, India. Prior to that, he was working as Assistant System Engineer in Tata Consultancy Services (TCS), Pune, India from September 2007 to December 2008. From August 2011 to November 2011, he was visiting research student at Massey University, Auckland, New Zealand. From March 2013 to November 2014, he was pursuing postdoctoral research at the CominLabs Excellence Center, Université Europèenne de Bretagne (UEB) and Supélec, Rennes, France for the project GREAT: Green Cognitive Radio for Energy-Aware Wireless Communication Technologies Evolution. His research interests include design and implementation of multistandard wireless communication receivers as well as application of machine learning algorithms and decision making policies for various wireless communication applications.

    Download this article

    Similar Projects

    Stable Matching Based Resource Allocation to Maximize Throughput and Minimize Interference in 5G Networks

    Winter research project, IIT-P, 2017

    Modelled resource allocation problem in 5G heterogeneous multi-tier networks with the concept of stable matching and graph theory.

    Read the article

    Contact.

    shaswat221b@gmail.com skype: +918839718453 +91 8839718453
    • Shaswat Satapathy
    • B.Tech (3rd Year)
    • Computer Engineering Department,
    • International Institute of Information Technology, Bhubaneswar