vectorizer classes can take a custom tokenizer callable. Develop or use a tokenizer function for Swiss German / German mixed texts. then words as features instead of char ngrams would make sense