Profiled DDL RAMP.zip (1.27 GB)
Profiled DDL RAMP
Collection of profiled models used to estimate the disrtibuted training time for different Transformer Encoder models partiotioned using Megatron partitioning strategy, for different target losses
Collection of profiled models used to estimate the disrtibuted training time for different Transformer Encoder models partiotioned using Megatron partitioning strategy, for different target losses