NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models
Kavli Affiliate: Xiang Zhang | First 5 Authors: Han Han, Tong Zhu, Xiang Zhang, Mengsong Wu, Hao Xiong | Summary: Large language models (LLMs) combined with tool learning have gained impressive results in real-world applications. During tool learning, LLMs may call multiple tools in nested orders, where the latter tool call may take the former […]
Continue.. NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models
Causal Image Modeling for Efficient Visual Understanding
Kavli Affiliate: Feng Wang | First 5 Authors: Feng Wang, Timing Yang, Yaodong Yu, Sucheng Ren, Guoyizhe Wei | Summary: In this work, we present a comprehensive analysis of causal image modeling and introduce the Adventurer series models where we treat images as sequences of patch tokens and employ uni-directional language models to learn visual […]
Continue.. Causal Image Modeling for Efficient Visual Understanding
Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency
Kavli Affiliate: Feng Wang | First 5 Authors: Feng Wang, Timing Yang, Yaodong Yu, Sucheng Ren, Guoyizhe Wei | Summary: In this work, we introduce the Adventurer series models where we treat images as sequences of patch tokens and employ uni-directional language models to learn visual representations. This modeling paradigm allows us to process images […]
Continue.. Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency
Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency
Kavli Affiliate: Feng Wang | First 5 Authors: Feng Wang, Timing Yang, Yaodong Yu, Sucheng Ren, Guoyizhe Wei | Summary: In this work, we introduce the Adventurer series models where we treat images as sequences of patch tokens and employ uni-directional language models to learn visual representations. This modeling paradigm allows us to process images […]
Continue.. Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency
Ion-Assisted Nanoscale Material Engineering in Atomic Layers
Kavli Affiliate: Xiang Zhang | First 5 Authors: Hossein Taghinejad, Mohammad Taghinejad, Sajjad Abdollahramezani, Qitong Li, Eric V. Woods | Summary: Achieving deterministic control over the properties of low-dimensional materials with nanoscale precision is a long-sought goal. Mastering this capability has a transformative impact on the design of multifunctional electrical and optical devices. Here, we […]
Continue.. Ion-Assisted Nanoscale Material Engineering in Atomic Layers
How to evaluate your medical time series classification?
Kavli Affiliate: Xiang Zhang | First 5 Authors: Yihe Wang, Taida Li, Yujun Yan, Wenzhan Song, Xiang Zhang | Summary: Medical time series (MedTS) play a critical role in many healthcare applications, such as vital sign monitoring and the diagnosis of brain and heart diseases. However, the existence of subject-specific features poses unique challenges in […]
Continue.. How to evaluate your medical time series classification?
Repurposing Foundation Model for Generalizable Medical Time Series Classification
Kavli Affiliate: Xiang Zhang | First 5 Authors: Nan Huang, Haishuai Wang, Zihuai He, Marinka Zitnik, Xiang Zhang | Summary: Medical time series (MedTS) classification is critical for a wide range of healthcare applications such as Alzheimer’s Disease diagnosis. However, its real-world deployment is severely challenged by poor generalizability due to inter- and intra-dataset heterogeneity […]
Continue.. Repurposing Foundation Model for Generalizable Medical Time Series Classification
Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling
Kavli Affiliate: Xiang Zhang | First 5 Authors: Yuguang Yang, Yu Pan, Jixun Yao, Xiang Zhang, Jianhao Ye | Summary: Zero-shot voice conversion (VC) aims to transform the source speaker timbre into an arbitrary unseen one without altering the original speech content.While recent advancements in zero-shot VC methods have shown remarkable progress, there still remains […]
Continue.. Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling
Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling
Kavli Affiliate: Xiang Zhang | First 5 Authors: Yuguang Yang, Yu Pan, Jixun Yao, Xiang Zhang, Jianhao Ye | Summary: Expressive zero-shot voice conversion (VC) is a critical and challenging task that aims to transform the source timbre into an arbitrary unseen speaker while preserving the original content and expressive qualities. Despite recent progress in […]
Continue.. Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling