Extracting lowercased russian words from UTF8 text

Question 1

You referring for ranges >255 (\x{0430}), which is internal Perl unicode format. But your strings seems is not converted into that format. You need to setup use utf8; pragma. This works for me:

#!/usr/bin/perl -w

use strict;
use warnings;
use utf8;

binmode(STDOUT, ":utf8"); #Fix stdout warning

while(<DATA>) {
    print lc($1)."\n" while /\b([\x{0430}-\x{044F}]{3,})\b/g;
}
__DATA__
Все смешалось в доме Облонских. Жена узнала, что муж был.
в связи с бывшею в их доме француженкою-гувернанткой, и объявила мужу, что не может жить с ним в одном доме.
Положение это продолжалось уже третий день и мучительно чувствовалось и самими супругами, и всеми членами семьи, и домочадцами.

But is more correct way is to operate characters, not ranges. Also, if you reading from some file you maybe need to setup utf8 flag:

#!/usr/bin/perl -w

use strict;
use warnings;
use utf8;

binmode(STDOUT, ":utf8");

while(<>) {
    utf8::decode($_); #Convert into internal utf8 format
    print lc($1)."\n" while /\b([а-яА-ЯёЁ]{3,})\b/g;
}

Файл:

Однажды в студёную зимнуюю пору... ёёёёЁЁЁ йййЙЙЙЙ
Приветт, земляк!

If you enable use utf8 lc() will know to to lowercase letters.

(ёЁ is separated cus it is umlaut and not fit into range)

Question 2

You need to set your STDIN and STDOUT to UTF-8:

binmode STDOUT, ':utf8';
binmode STDIN, ':utf8';

Your regex should work after this.

That said, I would use a combination of Unicode property tests instead of an explicit range:

\b(((?=\p{Cyrillic})\p{Lowercase_Letter}){3,})\b