Scrap Emails From Gmail via Drush

In the following are steps to scrap emails from gmail account. We first archivete all emails into a file(.mbox). Then we run our custom drush command to scap all emails into output-email.csv file. At last, we use tool such as BriteVerify.com to filter only valid emails.

Archive All Emails

Go to Google Takeout and create archive file(.mbox) from your gmail account

Scrap with Drush

Once you have archive(.mbox) file, run the following drush command:

drush scrap-email --file-name=path/to/name-of-file.mbox

This will create file output-email.csv in your current directory with all the emails
Please, see Appendix below for scrap-email drush command, so you can install on your own machine

Verify Emails

Like with many things, it is also the case with our custom Drush scrap command that it is not perfect and it scraps some bad emails. To clean out the bad emais, we used tool BriteVerify.

Appendix

Here is full Drush command for scraping emails. Please, ensure to put it in file named scrap.drush.inc. For how to install, please, see post – Implementing Custom Drush Commands


<?php
// Same as error_reporting(E_ALL);
//ini_set('error_reporting', E_ALL);
ini_set('memory_limit', '850M');
set_time_limit(0);

function scrap_drush_command()
{
    $items = array();
    $items['scrap_email'] = array(
        'description' => "Scraps all emails from google archive(.mbox) and stores it in output-email.csv in current dir",
        'arguments' => array(//            'type' => 'The type of the smile (half_moon, polity, etc.)',
        ),
        'options' => array(
            'file-name' => 'path to name of the google archive file(.mbox). It can be relative to current dir',
        ),
        'examples' => array(
            'drush scrap_email --file-name=my-gmail.blox' => 'scraps all emails from my-gmail.mbox and stores emails in output-emails.csv in current dir',
        ),
        'aliases' => array('semail'),
        'bootstrap' => DRUSH_BOOTSTRAP_DRUSH, // No bootstrap at all.
    );
    return $items;
}

function drush_scrap_email()
{
    $filepath = drush_get_option('file-name');
    if (!file_exists($filepath)) {
        $filepath = getcwd() . '/' . $filepath;
        if (!file_exists($filepath)) {
            drush_die('File - ' . $filepath . ' doesn\'t exist', 0);
        }
    }

   drush_log('begin scraping...','ok');

    $chunk = 10 * 1024 * 1024; // bytes per chunk (10 MB)

    $f = fopen($filepath, 'rb') or die("Couldn't get handle for " . $filepath);
    $data = '';
    if ($f) {
        while (!@feof($f)) {
            $data .= fgets($f, 4096);
        }
        fclose($f);
    }

    drush_log('done reading string of size: ' . mb_strlen($data, '8bit') . '... start searching','ok');

    $pattern = "/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/";
    preg_match_all($pattern, $data, $matches);

    $all_emails = array_unique(array_values($matches[0]));
    $all_emails_filtered = array_filter($all_emails, 'filter_bad_emails');

    print_r($all_emails_filtered);
    drush_log('Count:' . count($all_emails_filtered),'ok');

    drush_log('writing...','ok');

    $date = date('m-j-y');
    $filename = 'output-emails-'.$date.'.csv';
    $filepath = getcwd() . '/' . $filename;

    $file = fopen($filepath, "w") or die("Couldn't get handle for " . $filepath);
    if ($file) {
        foreach($all_emails_filtered as $email){
            fputcsv($file, array($email));
        }
    }

    fclose($file);
    drush_print('done');
}

function filter_bad_emails($email)
{
    $char = $email[0];
    $email_tokens = explode('@', $email);
    $domain_name = array_pop($email_tokens);
    $ext_tokens = explode('.', $domain_name);
    $ext = array_pop($ext_tokens);
    if ($char == '-' || $char == '_' || $char == '.' || is_numeric($char) || (strlen($email) > 30) || (strlen($ext) > 4) || is_numeric($ext) || ($ext == 'c') || ($ext == 'n') || ($domain_name == 'mail.gmail.com')) {
        return false;
    } else {
        return true;
    }
}

Implementing Custom Drush Commands

In this post, we cover how to install and create your first custom Drush command

Install Custom Drush Commands

There are two steps to install any drush command:

  1. a) copy drushrc.php from /path/to/drush/example to your $HOME/.drush/ directory if not already present
  2. b) In the drushrc.php specify directory containing your drush cusotm commands
    ($options[‘include’]=/path/to/my/drush/commands
    

This will import the directory where your custom drush function will reside. Lets create one

Implementing Custom Drush Command

Imlementing drush command can be broken into 3 steps:

  1. Create File
    After DRUSH is aware of the location of our custom commands, then we can create any file by extension ‘.drush.inc’, because Drush will load all files with the extension ‘.drush.inc’. Lets, say we have file smile.drush.inc
  2. Declare Command
    Next, lets declare command in smile.drush.inc as following:

    function FILE-NAME_drush_command() {
        $items = array();
        $items['make-me-smile'] = array(
            'description' => "Makes a Happy Smile.",
            'arguments' => array(
                'type' => 'The type of the smile (half_moon, polity, etc.)',
            ),
            'options' => array(
                '--time' => 'specify time of smile (e.g. 10,30 in sec)',
            ),
            'examples' => array(
                'drush smile polity --time=10' => 'Make a great smile that cheers you up for rest of the day.',
            ),
            'aliases' => array('smile'),
            'bootstrap' => DRUSH_BOOTSTRAP_DRUSH, // No bootstrap at all.
        );
        return $items;
    }
    

    This declares our custom Drush command – make-me-smile. Important to note, the file-name is the hook in the hook_drush_comand() declaration. The ‘boostrap’ specifies the level of Drush to boot in. Some of other bootstrap levels are ‘DRUSH_BOOTSTRAP_NONE’, ‘DRUSH_BOOTSTRAP_DRUPAL_ROOT’, etc…see /path/to/drush/includes/bootstrap.inc for full list and description

  3. Implement Command
    After we declare the custom command, lets implement it:

    function drush_make_me_smile($type = 'polite'){
        drush_print('Smiling....type:'.$type);
    }
    

    Here the first ‘smile’ is from the file name smile.drush.inc following the name of the drush command – make_me_smile, so the function name is combined into drush_COMMAND_NAME where all the ‘-‘ part of COMMAND-NAME need to be replaced to ‘_’

  4. Test Run

    ○  drush smile --help                                                                                      
    ○  drush smile fake                                                                                       
    Smiling....type:fake
    ○  drush make-me-smile test                                                                               
    Smiling....type:test
    

    here, we run our new custom command in two ways – by alias and by full name

    Troubleshooting

    1. Location of Drush installation

    run:

    which drush
    

    This will display executable of drush. Afterwards, see what directory is linking to

How To Install Drush

Drush is command line utility for installing, maintaining and troubleshooting Drupal platform. This post logs the steps for installing drush on Mac(OS X 10.7.5)/Linux Ubuntu and Windows as well

Linux & Mac

Install Drush

To install the Dev or the most current version for Drush:
1. Clone Drush git repository

sudo git clone https://github.com/drush-ops/drush.git /root/tools/drush

2. Put Drush executable in the search path:

sudo ln -s /root/tools/drush/drush /usr/local/bin/drush

If you don’t know the search locations, then look it up variable $PATH that lists all the search locations for executables:

echo $PATH

Drush is using composer to deploy, so lets install composer as following:

sudo curl -sS https://getcomposer.org/installer | sudo php

this will download the composer. It displays the location that you will need next step

mv dir/downloaded/composer.phar /usr/bin/composer

Here, you move the composer into a path that looks for executables as specified by $PATH, so it can be found. At last, lets install drush

cd /path/to/drush
composer install

DONE! Test it by running ‘drush –v’ which should display current version

Upgrade Drush From Legacy Install

1. Clone Drush git repository

sudo git clone https://github.com/drush-ops/drush.git drush

2. Find where the executable is currently used:

which drush

This will display path of current drush executable. Go to that directory, rename or delete it and create new link to the new version of Drush cloned in Step 1 as following

sudo ln -s /Users/margots/DevTools/drush/drush drush

Here, the path is to the new version of drush cloned in step 1

Done!!


Old Way

Prerequisites

  • wget -or-
  • unzip -or-
  • git -or-

Step-1: Installing Prerequisites.

Verify Unzip installed by running ‘unzip’ from command line. If it isn’t installed then:

sudo apt-get install unzip

Verify wget is installed by running ‘wget’ from command line. If it isn’t installed then:

curl -O http://ftp.gnu.org/gnu/wget/wget-1.14.tar.gz
sudo tar -xzf wget-1.14.tar.gz
cd wget-1.14
sudo ./configure --with-ssl=openssl
sudo make
sudo make install
Stept-2: Installing Drush with PEAR

To verify PEAR(PHP Extension and Application Repo) is installed type ‘pear version’ in command line. If it doesn’t exist then:
For Mac:

cd /usr/local
sudo wget http://pear.php.net/go-pear.phar
sudo php -d detect_unicode=0 go-pear.phar
sudo pear upgrade --force pear
sudo pear upgrade --force Console_Getopt Console_Table
sudo pear upgrade-all

For Ubuntu:

sudo apt-get install php-pear
sudo pear upgrade --force pear
Step-3: Installing Drush

To install drush:

sudo pear channel-discover pear.drush.org
sudo pear install drush/drush
which drush
drush

If you see the following message – ‘Drush needs to download a library from [..]Console_Table-1.x.x.tgz[..]’ with error, then:

sudo rm -Rf ~/.drush
Upgrade drush

To upgrade drush:

sudo pear upgrade drush

Windows

Prerequisites

  • cygen

Installing Drush

Run Windows installer for Drush listed at http://drush.ws/drush_windows_installer

Configure Drush for Cygen

To run drush from Cygen, we mount the path and then create shell alias. To mount path to drush in the Cygen, add the following in the /etc/fstab

C:\ProgramData\Drush\drush.php /cygdrive/c/ProgramData/Drush/drush.php binary,posix=0,user 0 0

Next, we create alias – drush by adding the following in .bashrc in your home dir

alias drush='/cygdrive/c/ProgramData/Drush/drush.php'

Afterwards, reload Cygen shell and run ‘drush –version’. It should display the drush version which is good way to verify if drush is working

Troubleshooting

1. Tip

If you run into any issues while trying to install some package, make sure you run ‘sudo apt-get update’

2. Unable to load autoload.php. Drush now requires Composer in order to install

This happen after upgrading drush. Solution is to install composer as following:

sudo curl -sS https://getcomposer.org/installer | sudo php

this will donwload the composer. It displays the location that you will need next step

mv dir/downloaded/composer.phar /usr/bin/composer

Here, you move the composer into a path that looks for executables as specified by $PATH, so it can be found. At last, lets install drush

cd /path/to/drush
composer install

The This should solve the problem

MS-DOS style path detected: C:\path\drush.php
Preferred POSIX equivalent is: /cygdrive/c/path/drush.php

You have to mount and then refer the drush in the Cygen. See step – Configure Drush for Cygen above

Warning: The lock file is not up to date with the latest changes in composer.json. You may be getting outdated dependencies. Run update to update them.

For details, please, see post – . To fix it, run:

composer update --lock

This will update the lock file to get rid of the error message,so you can proceed with the installation

[RuntimeException] vendor does not exist and could not be created

This message is given when the drush directory(or dir you try to run the composer installer) doesn’t have write permissions. To fix this:

sudo chmod -R 777 DIR
Unable to send e-mail drupal

To turn off drupal sending email add the following line to the php.ini

sendmail_path = /bin/true
[UnexpectedValueException] Could not parse version constraint ^2.6.3: Invalid version string

To solve this error, update the composer currently installed:

composer self-update
the requested PHP extension pcntl is missing from your system drush

The error came when installing drush via composer “composer install”. To solve, I deleted the composer.lock and rerun the installer “composer install –dev”

References

http://duntuk.com/how-install-drush-github-after-drupal-project-removal
https://drupal.org/node/1674222