Back to Blog

1/16/2024: TIES-Merging

TheBloke's Discord community actively discusses Mixture of Experts (MoE) models, focusing on random gate routing layers for training and the challenges of immediate model use. There is a robust debate on quantization methods, comparing GPTQ and EXL2 quants, with EXL2 noted for faster execution on specialized hardware. A new model, Nous Hermes 2, based on Mixtral 8x7B and trained with RLHF, claims benchmark superiority but shows some inconsistencies. The Frontier supercomputer at Oak Ridge National Laboratory is highlighted for training a trillion-parameter LLM with 14TB RAM, sparking discussions on open-sourcing government-funded AI research. Additionally, the application of ghost attention in the academicat model is explored, with mixed reactions from the community. *"Random gate layer is good for training but not for immediate use,"* and *"EXL2 might offer faster execution on specialized hardware,"* are key insights shared.

Read original post