At its simplest an RTOS kernel comprises a scheduler and a number of task synchronisation and IPC mechanisms. Since these are usually provided as libraries, if you only use the scheduler component of such a library, nothing else will be linked to your code. Moreover the option to add synchronisation, timers, and IPC later will more truly "future proof" the application, or at least its architecture.
It is not clear what you mean bu a "simple scheduler", but most RTOS schedulers support round-robin scheduling (uC/OS-II being a notable exception) for tasks at the same priority; if you make all your tasks the same priority, it does not get much simpler than that.
Note also that the ARM CMSIS includes an RTOS API that is very low-level and intended as the basis for higher-level operating systems, but which provides all the building blocks you need to create a kernel for your precise needs, including thread control (i.e. scheduling).
If you really want to avoid an RTOS altogether, then a state-machine architecture such as that supported by Quantum Leaps' QP Frameworks is an alternative.