A string is a sequence of characters that represent text. In programming, strings are often used to store text that will be displayed to the user or another computer.
Characters represent a symbol, but computers can only store binary, 1s and 0s. When a character is represented as a symbol on the screen, the program must encode the 1s and 0s. Python uses Unicode to encode any string declared with quotes.
To easily split a string in Python, just use the split() method. Here is a simple example to split a sentence into a list, with each word becoming a separate in the list.
txt = "This is a string" #declare string value
print(txt) #print the string
x = txt.split() #assign variable x to separated string value
print(type(x)) #print the split string to the console
The console output would be as follows:
This is a string
['This', 'is', 'a', 'string']
Let’s take a closer look at how to split a string in Python and see more examples like this one.
The split() method takes a string and splits it into a list of substrings based on a set of delimiters. This method is a pre-built type in Python and is easy to use. It has two parameters, sep and maxsplit:
str.split(sep=None, maxsplit=- 1)
Here, “sep” stands for separator or delimiter. This value defaults to whitespace if left blank or set to None. Delimiter characters are characters that are used to separate the substrings in the string. There can be multiple delimiters, and any character can be specified as the chosen delimiter.
On the other hand, “maxsplit” specifies how many splits are set to occur. If maxsplit is not set, it defaults to -1. This value means that there are no limits to the number of splits that are set to occur.
Splitting text strings is useful in data analysis to help users analyze collected data. Commas, colons, spaces, and quotation marks are often selected as delimiters.
There are various types of splits that can be performed in Python, and there are different methods aside from split() that can split a string:
A Comma Separated Values file, more commonly known as a .csv file, is a plain-text file that contains data separated by commas. These types of files are commonly seen in data aggregation and collection.
Splitting each value delimited by a comma can help analyze these data sets. Use the below example code to declare a string and separate each value in the list at each comma.
txt = "Abc,De,F" #declare string value
print(txt) #print the string before separating
x = txt.split(",") #assign variable x to separated string value and set sep to ','
print(x) #print the comma separated list
Here is the console output:
Abc,De,F
['Abc', 'De', 'F']
Please note, the commas in the console output are not part of the values in the string. The commas have been removed from the string and each value is contained separately in the list denoted by the variable x.
A list is a sequence type in Python. The two other sequence types are tuple and range. There are six principal built-in types in Python:
However, the main types used in string manipulation are string, list, and tuple.
If no parameters or arguments have been specified when calling the split() method, the parameters will default to sep=None and maxsplit=-1. This means that the separator or delimiter will be set to whitespace and splits will occur until the end of the string. See the below sample code:
txt = "This is a sentence." #declare string value
print(txt) #print the string
x = txt.split() #assign variable x to separated string value, sep and maxsplit are not assigned values
print(x) #print the list separated at each whitespace
Console output will be as follows:
This is a sentence.
['This', 'is', 'a', 'sentence.']
Notice how each word is separated in the list when a space is encountered in the original string. As well, the period is included in the last list item because there wasn’t any whitespace and the program was not instructed to remove periods.
The other parameter or argument allowed when calling the split() method is maxsplit. This parameter identifies how many iterations of the split() method are performed. See the below example with the maxsplit set to 1 and the delimiter set to the default value.
txt = "This is a sentence." #declare string value
print(txt) #print the string
x = txt.split(None,1) #assign variable x to separated string value, sep is assigned to None and maxsplit is assigned to 1
print(x) #print the list separated at each whitespace
The console output is below:
This is a sentence.
['This', 'is a sentence.']
Notice how only 1 split occurred, so there are only two items in the list. “This” is one value in the list and “is a sentence.” is one value in the list. Let’s test the maxsplit set to a value of 2.
txt = "This is a sentence." #declare string value
print(txt) #print the string
x = txt.split(None,2) #assign variable x to separated string value, sep is set to None and maxsplit to 2
print(x) #print the list separated at each whitespace
Console output:
This is a sentence.
['This', 'is', 'a sentence.']
Notice now we have 3 items in the list and 2 splits occurred. What if we set the maxsplit to something greater than the number of splits needed to pass through the inputted string?
txt = "This is a sentence." #declare string value
print(txt) #print the string
x = txt.split(None,10) #assign variable x to separated string value, sepsis set to None and maxsplit to 10
print(x) #print the list separated at each whitespace
Console output:
This is a sentence.
['This', 'is', 'a', 'sentence.']
Notice, we do not receive an error if maxsplit is greater than the number of splits available for the inputted string.
While Python has the split() method available to quickly split strings, there are a variety of other ways to split strings and different use cases for the split() method.
To create a word counter in Python, use split() and len() combined. The method len() can be used to return the length of objects and count the number of items in an iterable object.
string = "count the words in this sentence using len() and split()" #declare the string variable
print(string) #print the string to the consol
wordcount = len(string.split()) #use len() after using split() on the string
print("Word count: " +str(wordcount)) #print the word count to the console
Console output:
Count the words in this sentence using len() and split()
Word count: 10
Note how you can call len() and split() on the same line. These methods can be called separately but calling both on the same line uses fewer lines of code and still provides readability.
To split lines from a text file, import the file in Python. For this example our .txt file contains the below random car data corresponding to make, model, year, and a randomly generated VIN.
Volkswagen,Cabriolet,1991,JH4DC53874S439387
Honda,Civic,1992,3GTU2YEJ9DG224736
BMW,8 Series,1993,3N1CN7AP5EL329941
Pontiac,Vibe,2007,1G6DE5E54D0848444
Save this as a .txt file and use the below to import the .txt file and use splitlines().
with open("car_data.txt",'r') as data: #open the file in read mode designated by selecting the parameter ‘r’
file = data.read().splitlines() #call the read() method to read the contents of the car_data file and then splitlines() to create a new item after each line
print(file) #print the file to the console
Console output:
['Volkswagen,Cabriolet,1991,JH4DC53874S439387', 'Honda,Civic,1992,3GTU2YEJ9DG224736', 'BMW,8 Series,1993,3N1CN7AP5EL329941', 'Pontiac,Vibe,2007,1G6DE5E54D0848444']
Another example when declaring the string can be found below:
car_data = ("Volkswagen,Cabriolet,1991,JH4DC53874S439387 \n Honda,Civic,1992,3GTU2YEJ9DG224736 \n BMW,8 Series,1993,3N1CN7AP5EL329941 \n Pontiac,Vibe,2007,1G6DE5E54D0848444") #declare string using \n to signify a line break
x = car_data.splitlines() #declare the variable x as the result of calling splitlines()
print(x)
Console output:
['Volkswagen,Cabriolet,1991,JH4DC53874S439387 ', ' Honda,Civic,1992,3GTU2YEJ9DG224736 ', ' BMW,8 Series,1993,3N1CN7AP5EL329941 ', ' Pontiac,Vibe,2007,1G6DE5E54D0848444']
To split a string with multiple delimiters, import the “re module,” which stands for “regular expressions.” More information can be found in the Python documentation.
import re
text = "Split, this sentence. with; the chosen, characters." #declare the string variable
print(re.split("[;,.] ", text)) #print the split string
Console output:
['Split', 'this sentence', 'with', 'the chosen', 'characters.']
Notice the square brackets are used to denote the set of characters that will be used to split. There is a trailing space as well, so the sentence will be split only when either a semicolon, comma, or period with a trailing space is encountered.
Simply use the list() method to split a string into an array of characters.
text = "Python"
char_list = list(text)
print(char_list)
Console output:
['P', 'y', 't', 'h', 'o', 'n']
We can use slice notation to split a string in Python. Slice notation uses the following parameters:
a[start:stop:step]
To get the first two characters of a string, use the following method:
text = "Python"
print(text[:2]) #indicate the stop as 2 to get the first two characters of a string
Console output:
Py
To get the last two characters of a string, use the following method:
text = "Python"
print(text[-2:])
Console output:
on
Using slice notation is a powerful feature in Python and has the benefit of being more efficient than other methods.
Splitting a string in Python is a great way to learn about programming in that language. There are also real-world applications for splitting a string, primarily in data analysis where it is useful to extract information from .csv or .txt files.
Python can be used for a variety of purposes, from web development to scientific programming. While other languages may be better suited for specific tasks, Python is a jack-of-all-trades that can do almost anything.