Split on Whitespace in Python

Whitespace is a character or set of characters that represents vertical or horizontal space.

The split function takes a single optional argument. If you use this function without a parameter, it separates words by single or series of whitespace characters, as long as there is no other character between them.

my_str = ' Text    separated  \n by multiple    whitespaces    '
print(my_str.split())

In our case, in the string, there are spaces but also a new line character (\n). They are all treated the same by the split function.

['Text', 'separated', 'by', 'multiple', 'whitespaces']

Split on single or multiple whitespaces

The split function without parameter takes a variable number of characters. You can change that to a single character or a fixed number by characers by supplying an argument.

my_str = ' Text    separated  \n by multiple    whitespaces    '
print(my_str.split(' '))
print(my_str.split('  '))
print(my_str.split('\n'))

Here’s how the same string is going to look like for the string with single space, double space, and new line characters as arguments.

['', 'Text', '', '', '', 'separated', '', '\n', 'by', 'multiple', '', '', '', 'whitespaces', '', '', '', '']
 [' Text', '', 'separated', '\n by multiple', '', 'whitespaces', '', '']
 [' Text    separated  ', ' by multiple    whitespaces    ']

Split on whitespaces using regex

You can also use regular expressions to achieve the same result. You need to import the re module.

import re

my_str = ' Text    separated  \n by multiple    whitespaces    '
print(re.split(r'\s+', my_str))

The problem with this code is that it ads empty elements to the beginning and end of the list.

['', 'Text', 'separated', 'by', 'multiple', 'whitespaces', '']

You can remove the first and the last element of the list, to get rid of them. The problem is that if there is no whitespace character at the beginning or end of the string, there won’t be an empty character in the list. For this reason, we would have to check whether the first and last elements are empty.

The filter function

But there is a better way to do it. Let’s use the filter function that will filter out empty elements from the list.

import re

my_str = ' Text    separated  \n by multiple    whitespaces    '
my_str = re.split(r'\s+', my_str)
str_list = list(filter(None, my_str))
print(str_list)

The result is a list of strings without empty elements.

['Text', 'separated', 'by', 'multiple', 'whitespaces']

The strip function

Another way to deal with empty elements at the beginning or the end of the list is to strip whitespaces in front and at the end of the string. The strip function does just that. Similar to split, it also takes an optional argument, but we are not going to use it, because we want to get rid of all white spaces.

import re

my_str = ' Text    separated  \n by multiple    whitespaces    '
my_str = my_str.strip()
my_str = re.split(r'\s+', my_str)
print(my_str)

The result is also a list without empty elements.

['Text', 'separated', 'by', 'multiple', 'whitespaces']