You will be subject to race conditions in either case - a stap probe cannot take locks on kernel structures, which would be required to guarantee that the task list does not change while it's being counted. This is especially true for general systemtap probe context, like in the middle of a kprobe.
For the first approach, you could add a "probe begin {}"-time iteration of the task list to prime the initial thread counts from a bit of embedded-C code. One challenge would be to set systemtap script globals from the embedded-C code (there's no documented API for that), but if you look at what the translator generates (stap -p3), it should be doable.
The second approach would be to do the same iteration, but for locking reasons above, this is not generally safe.