Perl questions regarding unpack() and the v flag in printf()

I am trying to accomplish the following:

For an arbitrary Perl string (whether or not it is internally encoded in UTF-8, and whether or not it has the UTF-8 flag set), scan the string from left to right, and for every character, print the Unicode code point for that character in hex format. To make myself absolutely clear: I do not want to print UTF-8 byte sequences or something; I just would like to print the Unicode code point for every character in the string.

# Prints the following to the console (the console is UTF8):
# αω
# 3B1.3C9

Then I have seen some examples, but without reasonable explanations, which made me doubt that my solution is correct, and now I have got questions regarding my own solution as well as the examples.

1) Perl's documentation about the v flag in (...)printf says:

"This flag tells Perl to interpret the supplied string as a vector of integers, one for each character in the string. [...]"

It does not say what it exactly means by "a vector of integers", though. When looking at the output of my example, it seems that those integers are the Unicode code points, but I would like to have this confirmed by somebody who knows for sure.

Hence the question:

1) Can we be sure that every integer which is pulled from the string that way is the respective character's Unicode code point (and not some other byte sequence)?

Secondly, regarding an example which I have found (slightly modified; I can't remember where I got it from, maybe from the Perl docs):

# Prints the following to the console (the console is UTF8):
# αω
# 3B1.3C9

Being a C and assembly guy, I just don't get why somebody would write the

printf

statement like shown in the example. According to my understanding, the respective line is syntactically equivalent to:

for $_ (unpack('C0A*', $Text)) {
printf "%vX\n", $Text;
}

As far as I have understood,

unpack()

takes

$Text

, unpacks it (whatever that means in detail) and returns a list which in this case has one element, namely the unpacked string. Then $_ runs through that list with one element (without being used anywhere), hence the block (i.e. the

printf()

) is executed once. In summary, the only action which is done by the above snippet is executing

printf "%vX\n", $Text;

one time.

Hence the question:

2) What could be the reason for wrapping this into a for loop like shown in the example?

Final questions:

3) If the answer to question 1) is "yes", why do most examples I have seen use

unpack()

after all?

4) In the three line snippet above, the parentheses which surround the

unpack()

are necessary (leaving them away leads to syntax errors). In contrast, in the example, the

unpack()

does not need to be enclosed in parentheses (but it does not harm if they are added nevertheless). Could anybody explain the reason?

Edit / Update in reply to ikegami's answer below:

Of course, I know that strings are sequences of integers. But

a) There are many different encodings for those integers, and the bytes which are in a certain string's memory area depend on the encoding, i.e. if I have two strings which contain exactly the same character sequence, but I store them in memory using different encodings, the byte sequences at the strings' memory locations are different.

b) I strongly suppose that (besides Unicode) there are many other systems / standards which map characters to integers / code points. For example, the Unicode code point 0x3B1 is the Greek letter α, but in some other system, it may be the German letter Ö.

Under these circumstances, the question makes perfect sense IMHO, but I possibly should be more precise and reword it:

If I have a string

$Text

which only contains characters which are Unicode code points, and if I then execute

printf "%vX\n", $Text;

, will it print the Unicode code point in hex for every character under all circumstances, notably (but not limited to):

regardless of Perl's actual internal encoding of the string

regardless of the string's UTF-8 flag

whether or not

use 'unicode_strings'

is active

If the answer is yes, what sense do all the examples make which are using

unpack()

, notably the example above? By the way, I now have remembered where I got that one from: the original form is in Perl's

pack()

documentation, in the section about the C0 and U0 mode. Since they are using

In this case, if each character of the string is a UCP, then sprintf '%vX' will print those UCPs in hex.

I just don't get why somebody would write the printf statement like shown in the example.

Neither do I. for can be used as a topicalizer, meaning

for ($s) {
s/^\s+//;
s/\s+\z//;
}

is equivalent to

$s =~ s/^\s+//;
$s =~ s/\s+\z//;

But it's not used that way here.

In the three line snippet above, the parentheses which surround the unpack() are necessary (leaving them away leads to syntax errors). In contrast, in the example, the unpack() does not need to be enclosed in parentheses

You mention you come from a C background. Perl is just like C in this respect. Specfically,

The conditional or loop expression of flow control statements must be in parens. In Perl, the syntax for a foreach loop is for (EXPR) BLOCK [ continue BLOCK ].

A STATEMENT can be an EXPR.

For example,

while (f()) { } # Allowed in C and Perl.
while f() { } # Not allowed in C or Perl.
f(); # Allowed in C and Perl.
(((((f()))))); # Allowed in C and Perl.