前のエピソード――What is the difference between variable declaration and variable definition in programming?

solarplexuss:

"Today, we will have a fun discussion on what is often seen in various written materials: the fact that half-width alphanumeric characters occupy 1 byte, while full-width characters occupy 2 bytes when stored. Let's explore this topic together."

Solar: "Half-width alphanumeric characters refer to those letters like the English letter "a" and the numeral "1" that are displayed in the command prompt window or on this Kakuyomu platform. If half-width alphanumeric data is represented by one byte, it means that the data for the letter "a" or the numeral "1" is also stored in one byte, which is equivalent to 8 bits. Well, that's only natural, isn't it? 😊"

Aletha: "If you have 8 bits of memory, you can store 0 or 1 in each bit, and you can store 8 of them together, like 00100000 or 10000001.

Furthermore, the values stored in memory correspond to specific numbers. For example,

00000000 corresponds to the number 0,

00000001 corresponds to the number 1,

00000010 corresponds to the number 2,

and so on, up to 11111111, which corresponds to the number 255."

Solar: "I understand that numerical data (numbers from 0 to 255) is stored in one byte, but I'm not sure what it means that a half-width character like "a" takes up one byte or 8 bits of data. Hmm, I'm not quite sure...

If we carefully observe the shape of the half-width character "a"...

Take a good look. Do you think we can express this half-width character "a" using only 0s and 1s for 8 bits?"

solarplexuss:

"Ummm (^^♪...

The half-width character "a" has a data size of one byte, huh?

There's a beautiful curve in this "a" character, isn't there?

I feel like it wouldn't be possible to express this smooth curved shape with just 0 and 1 using only 8 bits.

I wonder how it works?"

Solar:

"Let's say that a single half-width alphanumeric character is represented on a computer screen by a dot display frame like this:

□□□□□□□

(Of course, there are more squares and dots in reality.)

To represent a half-width alphanumeric character, we'll fill in the squares of this dot frame with black.

For example, the character "1" would look like this:

□□□■□□□

Does it look like a "1" to you?

The character "a" would look like this:

■■■■■■■

□□□□□□■

■■■■■■■

■□□□□□■

■■■■■■■

Does it look like an "a"?"

Is everything okay up to this point?

The number of patterns to fill in the squares below with ■ is 2 to the power of 35, such as:

□□□□□□□

□□□□□□■

□□□□□□□

□□□□□■□

□□□□□□□

□□□□□■■

■■■■■■■

■■■■■■□

■■■■■■■

👇

Let's rearrange the following vertically aligned pattern horizontally:

□□□□□□□1□□□□□□□2□□□□□□□3□□□□□□□4□□□□□□□5

The pattern:

□□□□□□□

Is equivalent to the following when horizontally aligned:

□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□

This is equivalent to having 35 squares arranged horizontally.

Pattern to fill in with the symbol marked with ■:

□□□□□□□

□□□□□□■

□□□□□□□

□□□□□■□

□□□□□□□

□□□□□■■

■■■■■■■

Can be rearranged as:

□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□

□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□■

□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□■□

□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□■■

■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

If we represent ■ as 1 and □ as 0, the pattern filled with ■and□ earlier can be represented as:

00000000

00000001

00000000

00000010

00000000

00000011

11111111

These 2 to the power of 35 patterns can ultimately be represented as:

00000000000000000000000000000000000 (35 bits)

00000000000000000000000000000000001 (35 bits)

00000000000000000000000000000000010 (35 bits)

00000000000000000000000000000000011 (35 bits)

11111111111111111111111111111111111 (35 bits)

Express half-width alphanumeric character "a" using

■■■■■■■

□□□□□□■

■■■■■■■

■□□□□□■

■■■■■■■

and if □ is expressed as 0 and ■ is expressed as 1,

It will be represented as

1111111

0000001

1111111

1000001

1111111

if when lined up horizontally,

it can be expressed as 11111110000001111111110000011111111.

In other words, representing the half-width alphanumeric character "a" using 0 and 1 requires a 35-bit space to store the sequence of 0s and 1s, and it cannot be fully expressed using 8 bits alone.

This means that with a small data capacity of only 8 bits, or an 8-bit space, it is not possible to display the character "a" on the screen.

Therefore, even though the half-width alphanumeric character "a" is considered a 1-byte character, it is not directly represented using only 8 bits of 0s and 1s.

Solar: "In that case, it's possible to consider that numbers are assigned to half-width alphanumeric characters such as a, b, c, and stored in 1-byte memory.

Maybe the corresponding number for half-width alphanumeric characters a, b, c, and so on is something like this:

‭01100001‬

‭01100010

‭01100011‬

By using these numbers, image data for half-width alphanumeric characters a, b, c, and so on can be called up. That may be the mechanism."

In other words,

when we say that half-width alphanumeric data is stored in 1 byte, it means that the image information for half-width alphanumeric character a is not directly represented by 1 byte of 0s and 1s. Rather, the number (or ID) assigned to the image information for half-width alphanumeric character a is stored in 1 byte (8 bits) of memory, and that number is represented by ‭01100001‬."

solarplexuss「

To put it another way,

Somewhere, there is image data of half-width alphanumeric characters a, b, and c that has been created and saved.

This image data has been assigned numbers corresponding to half-width alphanumeric characters a, b, and c.

So, if you want to call up the image data for half-width alphanumeric character a, you just need to use the number assigned to half-width alphanumeric character a.

At that time,

if you use the %c conversion specifier in printf to display the number,

the character corresponding to that number will be displayed on the command prompt screen."

Solar: "Oh, maybe the number assigned to half-width alphanumeric character "a" is actually the ASCII code?"

Solarplexuss: "If that's the case, then one byte can only store 256 values from 0 to 255. Therefore, the image information of half-width alphanumeric characters can only be assigned numbers from 0 to 255, and there can only be a maximum of 256 half-width alphanumeric characters."

Right, right (´▽｀*)/?"

Solar: "Actually, there weren't 256 ASCII codes, were there...it was 127, right? Among those, I'm pretty sure ASCII codes were used for displaying characters...ASCII code 32 is a space, and ASCII codes 33 to 126 were image data for characters, I think."

Solarplexuss: "How do you know that? That's amazing!"

Solar: "Actually, I've seen it somewhere before. Where was it again? Anyway, when I looked at the range of numerical values that can be stored in char format, which is -128 to 127, I realized that 127 corresponds to the number of ASCII codes."

Solarplexuss: "Similarly, we can understand what it means for full-width character data to be stored in 2 bytes. Full-width characters include hiragana, katakana, kanji, and many other types of data. There are about 10,000 kanji characters alone, right? When full-width character data is stored in 2 bytes, it doesn't directly represent the image data of those full-width characters using 16 zeros and ones, as with half-width characters. Instead, it assigns a number to the image information of hiragana, katakana, and kanji, and stores it in 2 bytes of memory like this: 0001000101100000."

"Two bytes can store numerical values from 0 to 65535. By corresponding image data to values from 0 to 65535, full-width characters, which are also 2 bytes long, can only have a maximum of 65536 characters. Isn't it great? What do you think of this idea?"

Solar: "That's right. But maybe it's possible that the distinction between one-byte and two-byte characters isn't actually necessary. The numbers assigned to image data could be stored in just 10 bytes of memory. Maybe this is a remnant of systems from the early days of computing when data capacity was extremely limited. Perhaps in the past, even two-byte characters required a large amount of memory to be stored in a computer's memory. So, maybe at first, only half-width alphanumeric characters corresponding to the numbers (codes) stored in one byte of memory, such as ASCII code 32 (space), and characters corresponding to ASCII codes 33 to 126 were used, right? For example, let's say the computer's memory capacity was only 10 bytes. If you want to display the characters a, b, and c on the command prompt screen, you would use the ASCII codes 01100001, 01100010, and 01100011, respectively, to call up the image data for each letter. Therefore, if you store 01100001, 01100010, and 01100011 in the computer's memory, it would already use 3 bytes, which would cause the computer to freeze."

Solarplexuss: "Wow, that's true."

Solar: "If the numbers (codes) corresponding to the image data for half-width alphanumeric characters were stored across two bytes of memory instead of one, storing 0000000001100001 (2 bytes), 0000000001100010 (2 bytes), and 0000000001100011 (2 bytes) in the computer's memory would use up 6 bytes of the 10-byte capacity. That's why we were storing the numbers corresponding to the image data for characters in one byte of memory, but the maximum number of characters that could be displayed was limited to 255,

We would like to use not only English letters and symbols corresponding to ASCII codes, but also hiragana, katakana, kanji, and other characters.

When there was more memory available, we began using two bytes of numerical data, which consisted of 0 or 1 used 16 times, to correspond to the image data for hiragana, katakana, and kanji characters. For example, we assigned 0001101010100011 to the image data and used the assigned numerical data for the image data to display hiragana, katakana, and kanji characters.

Nowadays, there is enough memory capacity to store any number of codes corresponding to image data, so it might be possible to store the codes for the image data corresponding to 10 bytes of memory. However, this would be wasteful of memory, so we probably wouldn't do that. If we use too much memory, the computer might experience memory shortages and freeze."