Split on Whitespace in Python

A whitespace is a character or set of characters that represent vertical or horizontal space.

The split function takes a single optional argument. If you use this function without a parameter, it separates words by single or series of whitespace characters, as long as there is no other character between them.

In our case, in the string, there are spaces but also a new line character (\n). They are all treated the same by the split function.

['Text', 'separated', 'by', 'multiple', 'whitespaces']

Split into single or multiple whitespaces

The split function without a parameter takes a variable number of characters. You can change that to a single character or a fixed number of characters by supplying an argument.

Here’s how the same string is going to look for the string with single space, double space, and new line characters as arguments.

['', 'Text', '', '', '', 'separated', '', '\n', 'by', 'multiple', '', '', '', 'whitespaces', '', '', '', '']
 [' Text', '', 'separated', '\n by multiple', '', 'whitespaces', '', '']
 [' Text    separated  ', ' by multiple    whitespaces    ']

Split on whitespaces using regex

You can also use regular expressions to achieve the same result. You need to import the re module.

The problem with this code is that it adds empty elements to the beginning and end of the list.

['', 'Text', 'separated', 'by', 'multiple', 'whitespaces', '']

You can remove the first and the last element of the list, to get rid of them. The problem is that if there is no whitespace character at the beginning or end of the string, there won’t be an empty character in the list. For this reason, we would have to check whether the first and last elements are empty.

The filter function

But there is a better way to do it. Let’s use the filter function that will filter out empty elements from the list.

The result is a list of strings without empty elements.

['Text', 'separated', 'by', 'multiple', 'whitespaces']

The strip function

Another way to deal with empty elements at the beginning or the end of the list is to strip whitespaces in front and at the end of the string. The strip function does just that. Similar to split, it also takes an optional argument, but we are not going to use it, because we want to get rid of all white spaces.

The result is also a list without empty elements.

['Text', 'separated', 'by', 'multiple', 'whitespaces']