The idea of integrating different devices with a single USB connection is nice, but there are some inherent problems with this.
Two of those were already mentioned in the form of a question by @Stone262:
The answer to the first question is simple: It probably won’t.
If a device is not class compliant, there is no easy way to access the audio streams without drivers.
Your best bet to make an integrated product for this is Linux.
Since most companies don’t provide Linux drivers, integration seems unlikely unless it’s all from the same company.
Maybe Windows IOT?
The answer to the second question is a bit more difficult, but simply said:
The same way as Overbridge, resampling.
To sync one audio device to another without an audio clock like Wordclock, ADAT or S/PDIF, variable rate resampling is used. There are several papers on this subject if you google it.
Things like aggregate device on mac use the same technique, same for listen to this device on windows and software like VoiceMeeter, Overbridge and Virus TI.
The idea is simple:
Figure out the sample rate of the incoming device compared to the outgoing device.
This can be done for instance by monitoring how fast the incoming device fills a buffer, and the outgoing device empties a buffer of the same size.
When you know the difference, you can resample the input to match the output.
In practice this is not so easy and difficult to get stable, as you’ve noticed by all the Overbridge delays. Some of you might have a Virus TI, which is completely unpredictable on what system the TI function will actually work.
And this whole process introduces latency which is yet another downside.
There is this project though:
Someone like that could make it possible but it still introduces latency.
I would go Windows + VoiceMeeter and call it a day, it also takes care of the driver problem.
Mini pc’s are cheap and portable.