Several algorithms exist for decomposing strongly connected components (SCCs). To accommodate recent non-reversible trends in hardware we focus on multi-core architectures. Specifically, we consider SCC algorithms in the setting of an on-the-fly implementation (to be able to detect SCCs while constructing the graph - which is particularly useful for several verification techniques). We show that the current solutions are not capable of scaling efficiently and we propose a new algorithm that is able to do so. Our technique is (in contrast to the existing approaches) specifically designed to communicate partially discovered SCCs between workers. This is achieved by using a shared Union-Find structure. This structure has been extended to efficiently keep track of the search paths for each worker in combination with means to iterate and communicate fully explored vertices. We show that the designed algorithm is provably correct and performs in linear time. Experiments show that it outperforms existing techniques