Despite unprecedented advances in Natural Language Understanding (NLU), our models still dreadfully lack the ability to generalize to conditions that are different from the ones encountered during training. Such adverse conditions include learning for noisy domains up to the extreme case of adaptation: new languages. Recent work on transfer learning offers great promise to remedy the problem, particularly Multi-Task Learning (MTL). MTL has been applied successfully across NLU. However, most of the work has limited scope: e.g., only sharing across a few tasks or domains, and typically considering a single language. Little is known on when which type of sharing is most beneficial, especially if we want to expedite NLU to dozens of languages or customer-specific domains. In this project, we focus on a core NLU problem, sequence tagging, and ask: How can we create the best sequence labelers at scale, under adverse conditions, if little to no annotated data exists? We propose to combine diverse sources of supervision to bridge the gap, while also learning what and how to successfully share in MTL, to derive a set of best practices and models that quickly scale to new conditions.