How to convert string to Unicode characters in Python? Strings in Python are sequences of characters that are used to represent text. Unicode is a character encoding standard that aims to cover all characters from all human languages and various symbols. It allows computers to handle text in a uniform and represent and standardized way. Python includes built-in support for Unicode strings, and there are different techniques to create Unicode strings.
You can convert string to Unicode characters in Python using many ways, for example, by using the re.sub()
, ord()
, lambda, join()
, and format()
functions. In this article, I will explain how to convert strings to Unicode characters by using all these functions with examples.
1. Quick Examples of Python String Unicode
If you are in a hurry, below are some quick examples of converting strings to Unicode characters.
# Quick examples of converting string to unicode characters
import re
# Initializing string
string = "Sparkbyexamples"
# Example 1: Using re.sub() and ord() method
def unicode_replacement(match):
char = match.group()
return r'\u{:04X}'.format(ord(char))
result = re.sub(r'.', unicode_replacement, string)
# Example 2: Convert string to Unicode
# Using re.sub() and ord() method
result = (re.sub('.', lambda x: r'\u % 04X' % ord(x.group()), string))
# Example 3: Convert string to Unicode
# Using join(), format(), and ord() method
string = "Python"
unicode_result = ' '.join([r'\u{:04X}'.format(ord(ele)) for ele in string])
2. Convert String to Unicode Using re.sub() and ord() Method
You can convert a string to its Unicode representation using the re.sub()
function and the ord()
function. For example, you can import the re
module to work with regular expressions and define the string containing the characters you want to convert to Unicode. Then you can define a function named unicode_replacement
that takes a match object as an argument.
This function extracts the matched character using match.group()
, converts it to its Unicode code point using ord()
, and formats the Unicode code point as a four-digit hexadecimal representation with leading zeros using '{:04X}'.format(ord(char))
. The re.sub()
function is then used to replace each character in the string with its Unicode representation. The regular expression matches each individual character. The unicode_replacement
function is used as the replacement function within re.sub()
. Finally, you print the original string and the resulting string with Unicode representations.
import re
# Initializing string
string = "Sparkbyexamples"
print("Original string:", string)
# Using re.sub() and ord() method
def unicode_replacement(match):
char = match.group()
return r'\u{:04X}'.format(ord(char))
result = re.sub(r'.', unicode_replacement, string)
print("After converting the string to unicode:\n", result)
Yields below output.
To convert a string to its Unicode use the re.sub() and ord(), is another approach. When using the \u
escape sequence in a regular expression, ensure it is followed by four hexadecimal digits representing the Unicode code point. However, there’s a space in the \u % 04X
part, which should be removed.
In the below example. It takes the input string "Sparkbyexamples"
and converts each character in the string to its Unicode representation using the re.sub()
function and the ord()
function. The regular expression matches each character in the string and the lambda function within re.sub()
converts each matched character to its Unicode code point using the %04X
format specifier for four-digit hexadecimal representation.
import re
# Initializing string
string = "Sparkbyexamples"
print("Original string:", string)
# Convert string to Unicode
# Using re.sub() and ord() method
result = (re.sub('.', lambda x: r'\u % 04X' % ord(x.group()), string))
print("After converting the string to unicode:\n", result)
# Output:
# Original string: Sparkbyexamples
# After converting the string to unicode: \u 053\u 070\u 061\u 072\u 06B\u 062\u 079\u 065\u 078\u 061\u 06D\u 070\u 06C\u 065\u 073
3. Convert String to Unicode Using join(), format(), and ord()
You can also use the join() method along with the format() method and the ord()
function to convert a string to its Unicode representation.
In the below example, you use a list comprehension to iterate through each character in the string. For each character, you can use the ord()
function to get its Unicode code point, and then you use the format()
method to create the Unicode escape sequence using the hexadecimal value of the code point. The resulting Unicode escape sequences are joined using the join()
method, and each Unicode code point is separated by a space.
# Initializing string
string = "Python"
print("Original string:", string)
# Convert string to Unicode
# Using join(), format(), and ord() method
unicode_result = ' '.join([r'\u{:04X}'.format(ord(ele)) for ele in string])
print("After converting the string to Unicode:", unicode_result)
Yields below output.
Conclusion
In this article, I have explained some of the methods of Python like re.sub()
, ord()
, lambda, join()
, and format()
functions and using these methods how you can convert string to its Unicode representation with well-defined examples.
Happy Learning !!
Related Articles
- Python Convert String to Bytes
- Python Hex to String
- Python Join List to String by Delimiter
- Remove Substring From Python String
- Split the Python String Based on Multiple Delimiters
- Python String Copy
- Python String Concatenation
- Python String Sort Alphabetically
- Python String Formatting Explained
- Python String Methods
- Python String Explain with Examples
- Pad String with Spaces in Python
- Python Count Specific Character in String
- Python Convert String to Float
- Python Count Characters in String