Need to split Unicode string

Question 1

How about:

use utf8;
my $str = "ভাৰত is a famous country. দিল্লী is the capital of ভাৰত";
$str =~ s/([\x{0980}-\x{09FF}])(?=[\x{0980}-\x{09FF}])/$1 /g;
say $str;

output:

ভ া ৰ ত is a famous country. দ ি ল ্ ল ী is the capital of ভ া ৰ ত

You can use it in your program, just change the while loop to:

while(<>) {
    s/([\x{0980}-\x{09FF}])(?=[\x{0980}-\x{09FF}])/$1 /g;
    print $_;
}

But I think you whish to do:

my %corresp = (
    'ভ' => 'Bh',
    'া' => 'a',
    'ৰ' => 'ra',
    'ত' => 't',
);
my $str = "ভাৰত is a famous country. দিল্লী is the capital of ভাৰত";
$str =~ s/([\x{0980}-\x{09FF}])/exists($corresp{$1}) ? $corresp{$1} : $1/eg;
say $str;

Output:

Bharat is a famous country. দিল্লী is the capital of Bharat

NB: It's up to you to build the true corresponding hash. I don't know anything about Assamese characters.

Question 2

You can use \p{...} and \P{...} which will allow you to match or not match particular character classes as specified in perluniprops.

I'm using \P{Latin} which selects non-Latin characters , and \s in order not to match spaces:

#! /usr/bin/env perl
#
use strict;
use warnings;
use feature qw(say);

use utf8;
binmode(STDOUT, ':utf8');  # Why is this needed when you specify "use utf8;"?

my $string = "ভাৰত is a famous country";
$string =~ s/([^\p{Latin}\s])/$1 /g;  # Put a space after all non-latin chars
say $string;

This will print out:

ভ া ৰ ত  is a famous country

The only problem is that double space after ত.

Question 3

It's doing exactly what you tell it to. @a=split('') will split the entire line, you are not telling it to only split the first word. You will first need to identify the substring you want to split and then split it:

#!/usr/bin/perl
use utf8;
use Getopt::Std;
use IO::Handle;

binmode(STDIN,  ':utf8');
binmode(STDOUT, ':utf8');
binmode(STDERR, ':utf8');

while(<>)
{
    chomp;
    ## find the first word, capture it as $1 and delete it from the line
    s/(.+?)\s//;
    @a=split('',$1);
    ## Print your joined string and the rest of the line
    print join(" ",@a) . " $_\n";
}

Question 4

Add something like

$str =~ s/([\w]) (?<=[\w.,;:!?])/$1/g;

which intends to remove the space between latin word chars. With a look-ahead. Not 100%.