Dummy guide to Emoji: History, Nature and Usage
Well, brace yourselves, because you're about to get these questions answered and then some.
Emoji and Emoticons – no difference there, right?
iOS, Google, Microsoft and Samsung (top to bottom)
Emoji, on the other hand, usually consist of a single Unicode character. More on Unicode (and why I say “usually”), later. The smiley face emoticon from above can be expressed with the “Slightly Smiling Face” Emoji, which is depicted on the right. Do note that this Emoji will look different on different devices, unlike the smiley face Emoticon above, which will look exactly the same.
And there are also these things called Stickers, introduced in modern IM platforms. They are a whole different subject in general, but some IM developers insist on calling them Emoji for reasons, completely unknown to mankind.
Now, let's forget about Emoticons and Stickers for the time being and focus on the task at hand.
Where did these Emoji come from?
Okay, I promise I will get to the origin of Emoji soon. Bear with me for a minute, because there will be a short prelude first.
Once upon a time, the Tower of Babel was a thing, then it suddenly wasn't and people started speaking different languages. No matter do you believe the previous sentence or not, it's a fact that with the dawn of the digital age, different languages gave birth do different keyboards and different character encoding standards. These encoding standards had the sole purpose of transforming letters, numbers and other characters into binary code (ones and zeroes) for computers to understand, then reverse the process on the other end.
But, if you lived in Russia, and you wanted to send a message to your friend in Afghanistan, you had a problem. And a serious one at that. The difference of encoding standards meant that your Afghan friend might see complete gibberish appear on their screen, because the bits generated from your Russian encoder would mean something different for your friend's Afghan encoder.
Now, that's wonderful in theory, but it turns out to be quite a difficult task to achieve in practice, because the Consortium wanted to have backwards compatibility, too. The idea was to convert text to Unicode, then convert it back into your country's old standard, and suffer no losses of characters or meaning at all.
Shigetaka Kurita, Father of Emoji
The first set consisted of barely 172 Emoji, each with a size of 12x12 pixels. A modest collection, compared to the thousands and thousands of glyphs we have available right now. But despite the limited catalog, Emoji took off in Japan and swept across the nation, because they allowed to send images that required the same bandwith as text. Tiny, ugly and basic images, but images nonetheless.
At some point someone must have stopped and said “Wait, that's a picture of a rice bowl”. And this is when the Unicode Consortium had to choose. Either leave the Emoji out of the Unicode standard and risk sending incomplete messages every time someone converts from the Japan-specific standard to Unicode, or add the Emoji on top of what was already developed by the Consortium. Needless to say, the folk behind Unicode chose the latter.
Okay, everything is falling into place now. The Emoji exist, the Unicode standard gave them to the world and they are suddenly viral, right? Well, no.
iOS 5 was the first iOS with Emoji Keyboard
Then Apple came along. The company decided that it wants to sell its iPhone in Japan, so it had to include the (fairly well-hidden) option of typing Emoji, so Japanese users can enjoy the same smiley face support they've had up until then. And this is when it took off.
Okay, I get it, but how do Emoji work?
Emoji work exactly the same way as regular text – a Unicode code point corresponds to every character in the Unicode catalog, Emoji included. When a device sends a message, it sends a series of Unicode code points. When another device receives said message, it interprets the code points and displays letters, numbers and Emoji.
It's actually a bit more complicated than that, but this is the basic principle. However, if what I said is true, Emoji should look exactly the same on all platforms, right? But we all know that's not the case. Then how can I be correct?
The answer is rather simple. Every platform has its own Emoji font of sorts. Just like you can set your IM client to display incoming messages in Comic Sans MS, while the user on the other end is typing them in Arial. The same code points are sent and received, but they are displayed in a different style. It just so happens that, when it comes to text, the standard fonts to use are all the same. While Emoji, on the other hand, have no standard font that every platform relies on. Instead, different fonts are created by different companies.
And just like the letter "A" differs in style between Lucida and Times New Roman, so do Emoji look different on iOS and Android.
If every Emoji is a single character, why does Twitter count the Police Officer emoji as two?
Ah, a very good question. See, the “Police Officer” Emoji is actually one character. But you probably want to type in a police officer with dark skin, right? Well, there's this thing called “Skin Tone Modifier”, which is a whole separate Emoji.
Basically, instead of typing just “Police Officer”, you'd type “Police Officer” + “Dark Skin”. And since the skin tone modifier is a separate Emoji, you actually type in two characters.
What about the Female Police Officer with Dark Skin, though? This takes a whooping total of five characters! Here we have something else at play, called Zero-width Joiner (ZWJ), which is a Unicode character by itself.
These female officers cost us five Twitter characters each.
ZWJ was first used in languages that had to join characters in a different way, depending on other characters before and after them. Arabic would be a great example, as a lot of characters there look different in combination with other characters.
Same goes for Emoji. The ZWJ puts the Police Officer, Female and Dark Skin Modifier Emoji into one picture. So, what is actually typed in Twitter is “Police Officer”, ZWJ, “Female”, ZWJ, “Dark Skin Modifier”. You could think of ZWJ as the plus sign of Emoji.
If two Emoji can exist on their own (excluding Skin Tone Modifiers), they need a ZWJ to display as one picture, instead of two. This is called a ZWJ Sequence. And since the Skin Tone Modifier in the example is also a part of the sequence, it needs to follow the format and requires a ZWJ too, despite not needing it when it's not in such a sequence. This makes for a total of five Unicode characters and makes you limit your thoughts about that female cop to 135 symbols, instead of the 139 you expected.
We couldn't find a ZWJ image, because it's invisible. So, here are some cute hamster Emoji.
And if that wasn't enough to make your mind explode, there are also Emoji that can be displayed as text. For example the "Heavy Black Heart" Emoji. It does look red on all platforms, but this doesn't mean that it got the wrong name. In fact, it's based on the solid heart text symbol, which is black by default (︎). So, platforms need to know that we want the Emoji and not its text representation.
In order to do that, the "Heavy Black Heart" Emoji consists of two Unicode characters. One is the heart we already covered, and the other is an invisible symbol, called "Variation Selector-16". Said invisible character basically tells the system that the preceding characters should be displayed as Emoji and not text. Therefore, ︎ + "Variation Selector-16" = ️.
And this is still not the end of it. Emoji country flags are a whole different subject. In fact, they're not even flags, but a combination of two Unicode characters, called "Regional Indicator Symbols". These are, basically, the letters from A to Z, each surrounded by a rectangle of dots. And when you want to type in the flag of the United Sates of America, you're basically typing the letters U and S, each surrounded by dots, which is two symbols. And the Unicode Consortium chose this approach, so it won't have to deal with new flags popping up left, right and center with new countries forming each year and old countries changing flags, when they change regimes. The Consortium has covered all possible country code combinations and lets Unicode-based platforms deal with the rest.
Crazy, huh? That's the price of Unicode being universally applicable to anyone, though. And if you ask me, it's well worth it.
The original Emoji set.
Why did I need to know all this?
Because knowledge is power, I guess. And next time someone gets angry that their shooting range invite was displayed with a water pistol, you will know how it happened. Not to mention that this explains how the Love Hotel Emoji got there in the first place. Hint, such establishments are pretty popular in Japan.
Things that are NOT allowed: