Question

I wrote a python script to fetch all of my gmail. I have hundreds of thousands of old emails, of which about 10,000 were unread.

After successfully fetching all of my email, I find that gmail has marked all the fetched emails as "read". This is disastrous for me since I need to check all unread emails only.

How can I recover the information about which emails were unread? I dumped each mail object into files, the core of my code is shown below:

m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login(user,pwd)
m.select("[Gmail]/All Mail") 
resp, items = m.uid('search', None, 'ALL')
uids = items[0].split() 
for uid in uids:
    resp, data = m.uid('fetch', uid, "(RFC822)") 
    email_body = data[0][1]
    mail = email.message_from_string(email_body)
    dumbobj(uid, mail)

I am hoping there is either an option to undo this in gmail, or a member inside the stored mail objects reflecting the seen-state information.

For anyone looking to prevent this headache, consider this answer here. This does not work for me, however, since the damage has already been done.

Edit: I have written the following function to recursively "grep" all strings in an object, and applied it to a dumped email object using the following keywords:

regex = "(?i)((marked)|(seen)|(unread)|(read)|(flag)|(delivered)|(status)|(sate))"

So far, no results (only an unrelated "Delivered-To"). Which other keywords could I try?

def grep_object (obj, regex , cycle = set(), matched = set()):
    import re
    if id(obj) in cycle:
        return 
    cycle.update([id(obj)])
    if isinstance(obj, basestring):
        if re.search(regex, obj):
            matched.update([obj])

    def grep_dict (adict ):
        try:
             [  [ grep_object(a, regex, cycle, matched )  for a in ab ] for ab in adict.iteritems() ]
        except:pass

    grep_dict(obj)
    try:grep_dict(obj.__dict__)
    except:pass
    try:
        [ grep_object(elm, regex, cycle, matched ) for elm in obj ]
    except: pass
    return matched

grep_object(mail_object, regex)

No correct solution

OTHER TIPS

I'm having a similar problem (not with gmail), and the biggest problem for me was to make a reproducible test case; and I finally managed to produce one (see below).

In terms of the Seen flag, I now gather it goes like this:

  • If a message is new/unseen, IMAP fetch for \Seen flag will return empty (i.e. it will not be present, as related to the email message).
  • If you do IMAP select on a mailbox (INBOX), you get a "flag" UNSEEN which contains a list of ids (or uids) of emails in that folder that are new (do not have the \Seen flag)
  • In my test case, if you fetch say headers for a message with BODY.PEEK, then \Seen on a message is not set; if you fetch them with BODY, then \Seen is set
  • In my test case, also fetching (RFC822) doesn't set \Seen (unlike your case with Gmail)

In the test case, I try to do pprint.pprint(inspect.getmembers(mail)) (in lieu of your dumpobj(uid, mail)) - but only after I'm certain \Seen has been set. The output I get is posted in mail_object_inspect.txt - and as far as I can see, there is no mention of 'new/read/seen' etc. in none of the readable fields; furthermore mail.as_string() prints:

'From: jesse@example.com\nTo: user@example.com\nSubject: This is a test message!\n\nHello. I am executive assistant to the director of\nBear Stearns, a failed investment Bank.  I have\naccess to USD6,000,000. ...\n'

Even worse, there is no mention of "fields" anywhere in the imaplib code (below filenames are printed if they do not contain case-insensitive "field" anywhere):

$ grep -L -i field /usr/lib/python{2.7,3.2}/imaplib.py
/usr/lib/python2.7/imaplib.py
/usr/lib/python3.2/imaplib.py

... so I guess that information was not saved with your dumps.


Here is a bit on reconstructing the test case. The hardest was to find a small IMAP server, that can be quickly ran with some arbitrary users and emails, but without having to install a ton of stuff on your system. Finally I found one: trivial-server.pl, the example file of Perl's Net::IMAP::Server; tested on Ubuntu 11.04.

The test case is pasted in this gist, with two files (with many comments) that I'll try to post abridged:

  • trivial-serverB.pl - Perl (v5.10.1) Net::IMAP::Server server (has a terminal output paste at end of file with a telnet client session)
  • testimap.py - Python 2.7/3.2 imaplib
    client (has a terminal output paste at end of file, of itself operating with the server)

trivial-serverB.pl

First, make sure you have Net::IMAP::Server - note, it has many dependencies, so the below command may take a while to install:

sudo perl -MCPAN -e 'install Net::IMAP::Server'

Then, in the directory where you got trivial-serverB.pl, create a subdirectory with SSL certificates:

mkdir certs
openssl req \
  -x509 -nodes -days 365 \
  -subj '/C=US/ST=Oregon/L=Portland/CN=localhost' \
  -newkey rsa:1024 -keyout certs/server-key.pem -out certs/server-cert.pem

Finally run the server with administrative properties:

sudo perl trivial-serverB.pl

Note that the trivial-serverB.pl has a hack which will let a client to connect without SSL. Here is trivial-serverB.pl:

#!/usr/bin/perl

use v5.10.1;
use feature qw(say);
use Net::IMAP::Server;

package Demo::IMAP::Hack;
$INC{'Demo/IMAP/Hack.pm'} = 1;

sub capabilityb {
  my $self = shift;
  print STDERR "Capabilitin'\n";
  my $base = $self->server->capability;
  my @words = split " ", $base;
  @words = grep {$_ ne "STARTTLS"} @words
    if $self->is_encrypted;
  unless ($self->auth) {
    my $auth = $self->auth || $self->server->auth_class->new;
    my @auth = $auth->sasl_provides;
    # hack:
    #unless ($self->is_encrypted) {
    #  # Lack of encrpytion makes us turn off all plaintext auth
    #  push @words, "LOGINDISABLED";
    #  @auth = grep {$_ ne "PLAIN"} @auth;
    #}
    push @words, map {"AUTH=$_"} @auth;
  }
  return join(" ", @words);
}

package Demo::IMAP::Auth;
$INC{'Demo/IMAP/Auth.pm'} = 1;
use base 'Net::IMAP::Server::DefaultAuth';
sub auth_plain {
    my ( $self, $user, $pass ) = @_;
    # XXX DO AUTH CHECK
    $self->user($user);
    return 1;
}

package Demo::IMAP::Model;
$INC{'Demo/IMAP/Model.pm'} = 1;
use base 'Net::IMAP::Server::DefaultModel';
sub init {
    my $self = shift;
    $self->root( Demo::IMAP::Mailbox->new() );
    $self->root->add_child( name => "INBOX" );
}

###########################################
package Demo::IMAP::Mailbox;
use base qw/Net::IMAP::Server::Mailbox/;
use Data::Dumper;

my $data = <<'EOF';
From: jesse@example.com
To: user@example.com
Subject: This is a test message!

Hello. I am executive assistant to the director of
Bear Stearns, a failed investment Bank.  I have
access to USD6,000,000. ...
EOF
my $msg = Net::IMAP::Server::Message->new($data);
sub load_data {
    my $self = shift;
    $self->add_message($msg);
}
my %ports = ( port => 143, ssl_port => 993 );
$ports{$_} *= 10 for grep {$> > 0} keys %ports;

$myserv = Net::IMAP::Server->new(
    auth_class  => "Demo::IMAP::Auth",
    model_class => "Demo::IMAP::Model",
    user        => 'nobody',
    log_level   => 3, # at least 3 to output 'CONNECT TCP Peer: ...' message; 4 to output IMAP commands too
    %ports,
);

# apparently, this overload MUST be after the new?! here:
{
no strict 'refs';
*Net::IMAP::Server::Connection::capability = \&Demo::IMAP::Hack::capabilityb;
}

# https://stackoverflow.com/questions/27206371/printing-addresses-of-perl-object-methods
say " -", $myserv->can('validate'), " -", $myserv->can('capability'), " -", \&Net::IMAP::Server::Connection::capability, " -", \&Demo::IMAP::Hack::capabilityb;

$myserv->run();

testimap.py

With the server above running in one terminal, in another terminal you can just do:

python testimap.py

The code will simply read fields and content from the one (and only) message the server above presents, and will eventually restore (remove) the \Seen field.

import sys
if sys.version_info[0] < 3: # python 2.7
  def uttc(x):
    return x
else:                       # python 3+
  def uttc(x):
    return x.decode("utf-8")
import imaplib
import email
import pprint,inspect

imap_user = 'nobody'
imap_password = 'whatever'
imap_server = 'localhost'
conn = imaplib.IMAP4(imap_server)
conn.debug = 3

try:
  (retcode, capabilities) = conn.login(imap_user, imap_password)
except:
  print(sys.exc_info()[1])
  sys.exit(1)

# not conn.select(readonly=1), else we cannot modify the \Seen flag later
conn.select() # Select inbox or default namespace
(retcode, messages) = conn.search(None, '(UNSEEN)')
if retcode == 'OK':
  for num in uttc(messages[0]).split(' '):
    if not(num):
      print("No messages available: num is `{0}`!".format(num))
      break
    print('Processing message: {0}'.format(num))

    typ, data = conn.fetch(num,'(FLAGS)')
    isSeen = ( "Seen" in uttc(data[0]) )
    print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
            "Seen" if isSeen else "NEW"))

    print('Peeking headers, message: {0} '.format(num))
    typ, data = conn.fetch(num,'(BODY.PEEK[HEADER])')
    pprint.pprint(data)

    typ, data = conn.fetch(num,'(FLAGS)')
    isSeen = ( "Seen" in uttc(data[0]) )
    print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
            "Seen" if isSeen else "NEW"))

    print('Get RFC822 body, message: {0} '.format(num))
    typ, data = conn.fetch(num,'(RFC822)')
    mail = email.message_from_string(uttc(data[0][1]))
    #pprint.pprint(inspect.getmembers(mail))

    typ, data = conn.fetch(num,'(FLAGS)')
    isSeen = ( "Seen" in uttc(data[0]) )
    print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
            "Seen" if isSeen else "NEW"))

    print('Get headers, message: {0} '.format(num))
    typ, data = conn.fetch(num,'(BODY[HEADER])') # note, FLAGS (\\Seen) is now in data, even if not explicitly requested!
    pprint.pprint(data)

    print('Get RFC822 body, message: {0} '.format(num))
    typ, data = conn.fetch(num,'(RFC822)')
    mail = email.message_from_string(uttc(data[0][1]))
    pprint.pprint(inspect.getmembers(mail)) # this is in mail_object_inspect.txt
    pprint.pprint(mail.as_string())

    typ, data = conn.fetch(num,'(FLAGS)')
    isSeen = ( "Seen" in uttc(data[0]) )
    print('Got flags: {2}: {0} .. {1}'.format(typ,data, # Seen: OK .. ['1 (FLAGS (\\Seen))']
            "Seen" if isSeen else "NEW"))

    conn.select() # select again, to see flags server side
    # * OK [UNSEEN 0] # no more unseen messages (if there was only one msg in folder)

    print('Restoring flag to unseen/new, message: {0} '.format(num))
    ret, data = conn.store(num,'-FLAGS','\\Seen')
    if ret == 'OK':
      print("Set back to unseen; Got OK: {0}{1}{2}".format(data,'\n',30*'-'))
      print(mail)

      typ, data = conn.fetch(num,'(FLAGS)')
      isSeen = ( "Seen" in uttc(data[0]) )
      print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. [b'1 (FLAGS ())']
              "Seen" if isSeen else "NEW"))

conn.close()

References

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top