
@MiaAI_lab They way I understand is Sequences mean concurrent requests.
So your recipe can serve 6 requests at 1M context concurrently.
Under full/heavy load t/s can spike and drop
bc the longest 1M session has to finish generating for other concurrent requests to continue
English


















