4 posts tagged with "unicode"

Posts related to the Unicode standard or related experiments

A deep dive into unicode and string matching - variation selectors

February 16, 2025 · 7 min read

Digital plumber, organizational archaeologist and occasional pixel pusher

Earlier this week I learned about token bombing attacks[^1], where users may prompt LLMs with arbitrarily long byte streams that render as a single character or emoji. This is done by exploiting "variation selectors" - special Unicode code points that modify character appearance - and is a good follow up from my earlier posts about the unicode standard[^2][^3].

Unicode audio analyzer

August 19, 2024 · 4 min read

Bruno Felix

Digital plumber, organizational archaeologist and occasional pixel pusher

What does unicode, audio processing and admittedly bad early 2000s Internet memes have to do with one another?

In the previous post in the deep dive into unicode series we explored how combining characters like diacritics work. One interesting property of unicode is that it is possible to combine multiple combining characters together.

A deep dive into unicode and string matching - II

July 28, 2024 · 8 min read

Bruno Felix

Digital plumber, organizational archaeologist and occasional pixel pusher

In the previous entry of this series I went through a lightning tour of what is Unicode and provided some details into the various encodings that are part of the standard (UTF-8/16/32). This serves as the baseline knowledge for further exploration of how Unicode strings work, and some of the interesting problems that arise in this space.

A deep dive into unicode and string matching -I

July 21, 2024 · 8 min read

Bruno Felix

Digital plumber, organizational archaeologist and occasional pixel pusher

"The ecology of the distributed high-tech workplace, home, or school is profoundly impacted by the relatively unstudied infrastructure that permeates all its functions" - Susan Leigh Star

Representing, processing, sending and receiving text (also known as strings in computer-speak) is one of the most common things computers do. Text representation and manipulation in a broad sense, has a quasi infrastructural[^1] quality to it, we all use it in one form or another and it generally works really well - so well in fact that we often don't pay attention to how it all works behind the scenes.