Wednesday, April 16, 2014

String Operations: The Essential Function StringSplit

Split a String and Gather its Words by Their First Letter

Let's process a nice sonnet I just read in Ray Kurzweil's excellent book on what he thinks is the heart of cognitive science and AI, How to Create a Mind: The Secret of Human Thought Revealed.
To start I copy and paste the text from the web between quotation marks.

In[377]:= sonnet73=
"That time of year thou mayst in me behold,
When yellow leaves, or none, or few, do hang
Upon those boughs which shake against the cold,
Bare ruined choirs, where late the sweet birds sang.
In me thou seest the twilight of such day,
As after sunset fadeth in the west,
Which by and by black night doth take away,
Death's second self, that seals up all in rest.
In me thou seest the glowing of such fire,
That on the ashes of his youth doth lie,
As the death-bed whereon it must expire,
Consumed with that which it was nourished by.
This thou perceiv'st, which makes thy love more strong,
To love that well, which thou must leave ere long.";

Can we use ToLowerCase without Map? Yes, since it's Listable.

In[380]:= sonnet73LowerCase=ToLowerCase@%377

Out[380]= that time of year thou mayst in me behold,
when yellow leaves, or none, or few, do hang
upon those boughs which shake against the cold,
bare ruined choirs, where late the sweet birds sang.
in me thou seest the twilight of such day,
as after sunset fadeth in the west,
which by and by black night doth take away,
death's second self, that seals up all in rest.
in me thou seest the glowing of such fire,
that on the ashes of his youth doth lie,
as the death-bed whereon it must expire,
consumed with that which it was nourished by.
this thou perceiv'st, which makes thy love more strong,
to love that well, which thou must leave ere long.

We need StringSplit, not SplitBy, since we are working with a String. StringSplit is a powerful and essential function when importing and transforming large files to the format you need. Examples in the Doc Center like this one show its versatility:

In[1]:= StringSplit["a-b:c-d:e-f-g",{":","-"}]

Out[1]= {a,b,c,d,e,f,g}

Using WordBoundary as the pattern test for where to split works, but leaves much more "noise". Here is the output. Using InputForm is often handy to see what is going in while processing Strings.

In[409]:= StringSplit[sonnet73LowerCase,WordBoundary]//InputForm

Out[409]//InputForm=
{"that", " ", "time", " ", "of", " ", "year", " ", "thou", " ", "mayst", " ", "in",
 " ", "me", " ", "behold", ",\n", "when", " ", "yellow", " ", "leaves", ", ", "or",
 " ", "none", ", ", "or", " ", "few", ", ", "do", " ", "hang", "\n", "upon", " ",
 "those", " ", "boughs", " ", "which", " ", "shake", " ", "against", " ", "the", " ",
 "cold", ",\n", "bare", " ", "ruined", " ", "choirs", ", ", "where", " ", "late", " ",
 "the", " ", "sweet", " ", "birds", " ", "sang", ".\n", "in", " ", "me", " ", "thou",
 " ", "seest", " ", "the", " ", "twilight", " ", "of", " ", "such", " ", "day", ",\n",
 "as", " ", "after", " ", "sunset", " ", "fadeth", " ", "in", " ", "the", " ", "west",
 ",\n", "which", " ", "by", " ", "and", " ", "by", " ", "black", " ", "night", " ",
 "doth", " ", "take", " ", "away", ",\n", "death", "'", "s", " ", "second", " ",
 "self", ", ", "that", " ", "seals", " ", "up", " ", "all", " ", "in", " ", "rest",
 ".\n", "in", " ", "me", " ", "thou", " ", "seest", " ", "the", " ", "glowing", " ",
 "of", " ", "such", " ", "fire", ",\n", "that", " ", "on", " ", "the", " ", "ashes",
 " ", "of", " ", "his", " ", "youth", " ", "doth", " ", "lie", ",\n", "as", " ", "the",
 " ", "death", "-", "bed", " ", "whereon", " ", "it", " ", "must", " ", "expire",
 ",\n", "consumed", " ", "with", " ", "that", " ", "which", " ", "it", " ", "was", " ",
 "nourished", " ", "by", ".\n", "this", " ", "thou", " ", "perceiv", "'", "st", ", ",
 "which", " ", "makes", " ", "thy", " ", "love", " ", "more", " ", "strong", ",\n",
 "to", " ", "love", " ", "that", " ", "well", ", ", "which", " ", "thou", " ", "must",
 " ", "leave", " ", "ere", " ", "long", "."}

What a mess! And using this lengthy construct to DeleteCases of punctuation still left an imperfect result (which I won't show in case you're about to eat).

%//DeleteCases[#,""|" "|","|", "|" ,"|"'"|"s"|"st"|",\n"|",\n"|".\n"|"\n"|"-"|"."|" . "|"  . "]&

Compare the result from proper use of StringSplit.

In[405]:= sonnet73Split=StringSplit[sonnet73LowerCase,{Whitespace,".",","}]//DeleteCases[#,""]&

Out[405]= {that,time,of,year,thou,mayst,in,me,behold,when,yellow,leaves,or,none,or,few,do,hang,upon,those,boughs,which,shake,against,the,cold,bare,ruined,choirs,where,late,the,sweet,birds,sang,in,me,thou,seest,the,twilight,of,such,day,as,after,sunset,fadeth,in,the,west,which,by,and,by,black,night,doth,take,away,death's,second,self,that,seals,up,all,in,rest,in,me,thou,seest,the,glowing,of,such,fire,that,on,the,ashes,of,his,youth,doth,lie,as,the,death-bed,whereon,it,must,expire,consumed,with,that,which,it,was,nourished,by,this,thou,perceiv'st,which,makes,thy,love,more,strong,to,love,that,well,which,thou,must,leave,ere,long}

Here GatherBy groups sub-lists by the first character of each word.

In[411]:= sonnet73SplitGathered=GatherBy[sonnet73Split,First@Characters@#&]

Out[411]= {{that,time,thou,those,the,the,thou,the,twilight,the,take,that,thou,the,that,the,the,that,this,thou,thy,to,that,thou},{of,or,or,of,of,on,of},{year,yellow,youth},{mayst,me,me,me,must,makes,more,must},{in,in,in,in,in,it,it},{behold,boughs,bare,birds,by,by,black,by},{when,which,where,west,which,whereon,with,which,was,which,well,which},{leaves,late,lie,love,love,leave,long},{none,night,nourished},{few,fadeth,fire},{do,day,doth,death's,doth,death-bed},{hang,his},{upon,up},{shake,sweet,sang,seest,such,sunset,second,self,seals,seest,such,strong},{against,as,after,and,away,all,ashes,as},{cold,choirs,consumed},{ruined,rest},{glowing},{expire,ere},{perceiv'st}}

To see the Lists sorted by their initial letter, we should use SortBy, but this didn't work.

In[412]:= SortBy[%,First@Characters@#&]

We need to add another First to apply the sorting function to the First word in each sublist.

In[413]:= SortBy[%,First@Characters@First@#&]

Out[413]= {{against,as,after,and,away,all,ashes,as},{behold,boughs,bare,birds,by,by,black,by},{cold,choirs,consumed},{do,day,doth,death's,doth,death-bed},{expire,ere},{few,fadeth,fire},{glowing},{hang,his},{in,in,in,in,in,it,it},{leaves,late,lie,love,love,leave,long},{mayst,me,me,me,must,makes,more,must},{none,night,nourished},{of,or,or,of,of,on,of},{perceiv'st},{ruined,rest},{shake,sweet,sang,seest,such,sunset,second,self,seals,seest,such,strong},{that,time,thou,those,the,the,thou,the,twilight,the,take,that,thou,the,that,the,the,that,this,thou,thy,to,that,thou},{upon,up},{when,which,where,west,which,whereon,with,which,was,which,well,which},{year,yellow,youth}}

File Operations: Add the Total Byte Size of Files in Folders

File Operations: Add the Total Byte Size of Files in Folders

Mostly from the Carnegie-Mellon Pronouncing Dictionary (link available here: http://www.speech.cs.cmu.edu/cgi-bin/cmudict), I have created 138,418 files that I wanted to group into 3 folders of equal size. The files are in sub-folders according to the first letter of their filename, and a first pass at dividing them equally by dividing the Length of the original List by 3 split the last two Lists in the middle of files beginning with "p". I want to verify that the 3 sub-folders contain files of equal size or even them out. This involves some file operations and Select.

In[312]:= fileNames=FileNames["*.zip",{"C:\\Users\\kwcarlso\\Documents\\Kris\\Megapedia-Local\\Reference-English\\Target Files\\Reference-English"}]

Here and below I've abbreviated the output with "{filename, ..., filename}".

Out[312]= {C:\Users\kwcarlso\Documents\Kris\Megapedia-Local\Reference-English\Target Files\Reference-English\a.zip, ... ,C:\Users\kwcarlso\Documents\Kris\Megapedia-Local\Reference-English\Target Files\Reference-English\z.zip}

How to best tally the file byte size in each sub-folder? I considered using GatherBy to group the FileNames into the same groups as the sub-folders, and that probably would have worked, but would have been complicated to use three predicates, each with a range of letters in it. Discretion is always the better part of valor in programming, even to the point of just adding up the byte sizes by hand if this task were a one-off and to let me get on to the next task.

I decided to apply the principle of "divide-and-conquer" and use Select to group each set of files individually, then add byte sizes for each result. The predicate in Select is a set-theoretic operation, which indicated using a set-theoretic function, MemberQ. An original design principle of Mathematica's pre-cursor, Symbolic Manipulation Program (SMP), was to transparently map to all common mathematical functions and syntax, and it is simplest to do just that if possible. I use the infix functional style for MemberQ because it seems a bit more readable than functional bracketed style (but that's a trivial personal preference).

My first attempt failed since I forgot that CharacterRange is case-sensitive. I changed "A" and "F" to "a" and "f" and it worked perfectly. We need the final #& since the Select test works by simply applying the predicate to each element of the first argument List that you provide it.

In[316]:= fileNamesAF=Select[fileNames,CharacterRange["a","f"]~MemberQ~First@Characters@FileNameTake@#&]

Out[316]= {C:\Users\kwcarlso\Documents\Kris\Megapedia-Local\Reference-English\Target Files\Reference-English\a.zip, ..., C:\Users\kwcarlso\Documents\Kris\Megapedia-Local\Reference-English\Target Files\Reference-English\f.zip}

In[317]:= fileNamesGP=Select[fileNames,CharacterRange["g","p"]~MemberQ~First@Characters@FileNameTake@#&]

Out[317]= {C:\Users\kwcarlso\Documents\Kris\Megapedia-Local\Reference-English\Target Files\Reference-English\g.zip, ..., C:\Users\kwcarlso\Documents\Kris\Megapedia-Local\Reference-English\Target Files\Reference-English\p.zip}

In[318]:= fileNamesQZ=Select[fileNames,CharacterRange["q","z"]~MemberQ~First@Characters@FileNameTake@#&]
Out[318]= {C:\Users\kwcarlso\Documents\Kris\Megapedia-Local\Reference-English\Target Files\Reference-English\q.zip, ..., C:\Users\kwcarlso\Documents\Kris\Megapedia-Local\Reference-English\Target Files\Reference-English\z.zip}

At this point all we need to do is Map FileByteCount onto the filename in each folder, which means at Level 2 only (note the brackets around 2).

In[319]:= Map[FileByteCount,{fileNamesAF,fileNamesGP,fileNamesQZ},{2}]

Out[319]= {{11154319,14865868,17087067,12155371,7279606,8313716},{8802167,9846927,5719874,2458146,6091626,8605940,14729204,4588698,4529191,13363263},{795821,11363572,22685379,9138068,3902314,3644541,6431739,119495,1027761,1314296}}

Replacing List with Plus to add the numbers in each List is a common operation. When referring to Output I often use the line number instead of % since I can then change the function and re-Evaluate without modifiying it (to %%, %3, or then using the line number).

In[321]:= Plus@@#&/@%319

Out[321]= {70855947,78735036,60422986}

The result told me that moving the files beginning with "P" from the middle set into the last set would even those last two groups out.

Sunday, April 13, 2014

Mathematical Search Engine & Favorite Math Books

This specialized search engine searches all Mathematica sites (e.g. MathWorld (13K), Wolfram Demonstrations (9.5K), Wolfram Functions (307K), selected forums (e.g. MathGroup, Mathematica.StackExchange), this blog, top math sites, and then the web at large.

Mathematica Bookstore at Amazon

Recommended Math Books


Friday, April 11, 2014

An Advanced Predicate and Analysis of How It Works with Trace and Timing

Wagner gives a different version of allodd@x that is more efficient than the one shown here. This function stops testing as soon as it finds a number that is not odd. Contrary to his predisposition, Wagner uses a procedural loop, "de-constructing," as he calls it, the List to operate on its Parts.

In[269]:= allOddProcedural@aList_List:=Module[{i},
For[i=1,i<=Length@aList,i++,If[!OddQ[aList[[i]]],Break[]]];i>Length@aList]

But I bet OddQ is Listable and therefore its C compilation would be much faster than the For loop. First let's compare the Wagner allOddQ from above with his procedural one.

Wagner's function Breaks after OddQ returns False when testing the third argument, 6 (near the end of the Trace). Note the 100-fold difference in Timing!

In[275]:= list2={3,5,6,7};

In[276]:= allOdd@list2//Trace

Out[276]= {{list2,{3,5,6,7}},allOdd[{3,5,6,7}],Length[Select[{3,5,6,7},OddQ]]==Length[{3,5,6,7}],{{Select[{3,5,6,7},OddQ],{OddQ[3],True},{OddQ[5],True},{OddQ[6],False},{OddQ[7],True},{3,5,7}},Length[{3,5,7}],3},{Length[{3,5,6,7}],4},3==4,False}

In[277]:= allOddProcedural@list2//Trace

Out[277]= {{list2,{3,5,6,7}},allOddProcedural[{3,5,6,7}],Module[{i$},For[i$=1,i$<=Length[{3,5,6,7}],i$++,If[!OddQ[{3,5,6,7}[[i$]]],Break[]]];i$>Length[{3,5,6,7}]],{For[i$41099=1,i$41099<=Length[{3,5,6,7}],i$41099++,If[!OddQ[{3,5,6,7}[[i$41099]]],Break[]]];i$41099>Length[{3,5,6,7}],{For[i$41099=1,i$41099<=Length[{3,5,6,7}],i$41099++,If[!OddQ[{3,5,6,7}[[i$41099]]],Break[]]],{i$41099=1,1},{{i$41099,1},{Length[{3,5,6,7}],4},1<=4,True},{{{{{i$41099,1},{3,5,6,7}[[1]],3},OddQ[3],True},!True,False},If[False,Break[]],Null},{i$41099++,{i$41099,1},{i$41099=2,2},1},{{i$41099,2},{Length[{3,5,6,7}],4},2<=4,True},{{{{{i$41099,2},{3,5,6,7}[[2]],5},OddQ[5],True},!True,False},If[False,Break[]],Null},{i$41099++,{i$41099,2},{i$41099=3,3},2},{{i$41099,3},{Length[{3,5,6,7}],4},3<=4,True},{{{{{i$41099,3},{3,5,6,7}[[3]],6},OddQ[6],False},!False,True},If[True,Break[]],Break[]}},{{i$41099,3},{Length[{3,5,6,7}],4},3>4,False},False},False}

In[262]:= list1=Range[1,10^7,2];

Timing[#@list1]&/@{allOddProcedural,allodd}

{{9.297660,True},{0.109201,<<1>>}}

Why Use Built-in Functions? They're Already Written, De-bugged, and Fast

Now let's try my premise - first, OddQ is Listable as I thought.

In[260]:= Attributes@OddQ

Out[260]= {Listable,Protected}

Let's write the simple function:

In[279]:= allOddFunctional@aList_List:=If[!OddQ@aList,i>Length@aList]

In[280]:= Timing[allOddFunctional@list1]

Out[280]= {0.093601,<<1>>}

allOddFunctional is 10X faster than Wagner's procedural function, even though it appears to check every argument:

In[281]:= allOddFunctional@list2//Trace

Out[281]= {{list2,{3,5,6,7}},allOddFunctional[{3,5,6,7}],If[!OddQ[{3,5,6,7}],i>Length[{3,5,6,7}]],{{OddQ[{3,5,6,7}],{OddQ[3],OddQ[5],OddQ[6],OddQ[7]},{OddQ[3],True},{OddQ[5],True},{OddQ[6],False},{OddQ[7],True},{True,True,False,True}},!{True,True,False,True}},If[!{True,True,False,True},i>Length[{3,5,6,7}]]}

The lesson is simply that built-in functions are almost always faster than ones we make if for no other reason than they are compiled into C.

Here Wagner shows the use of While instead of For to stop when it finds an odd number.

allodd@aList_List:=Module[{i=1},While[i<=Length@aList&&OddQ@aList[[i]],i++];i>Length@aList]

Here is a Dropbox link to Wagner's book. High kudos to "Mr Wizard Todd" for asking McGraw-Hill for the right to distribute Wagner's out-of-print book. Here's his post on the StackExchange Mathematica forum.

https://www.dropbox.com/s/kllwg6y44p8va5g/Wagner%20All%20Parts-RC.pdf

Here is a link to a compressed file: http://www.verbeia.com/mathematica/PowerProgMa.zip


More Complex Predicates: allOddQ, allIdenticalQ, subsetQ

"Mr Wizard Todd", Manfred Plagmann, and Sophia Scheibe have done Mathematica users a nice favor by getting permission from McGraw-Hill to distribute licensed copies of David Wagner' s superb book, Power Programming in Mathematica. (Dropbox link: https://www.dropbox.com/s/kllwg6y44p8va5g/Wagner%20All%20Parts-RC.pdf, compressed file: http://www.verbeia.com/mathematica/PowerProgMa.zip)

Here are some nice examples of predicates from Wagner (re - written in my syntax).Test a list to see if all of its entries are odd integers.

In[216]:= allOdd@aList_List:=Length@Select[aList,OddQ]==Length@aList

In[217]:= allOdd@{2,3,5,6}

Out[217]= False

In[218]:= allOdd@{3,5,7,9}

Out[218]= True

This predicate tests a List to see if its parts are identical. Count returns all Parts that are Equal to the First Part.

In[219]:= identicalListPartsQ@aList_List:=Count[aList, First@aList] == Length@aList

In[220]:= identicalListPartsQ@{a,b,c,4}

Out[220]= False

In[221]:= identicalListPartsQ@{a,a,a,a}

Out[221]= True

SameQ vs. Equal  and Testing for Lists

SubsetQ determines if its first argument is a subset of its second argument and returns True or False. Wagner's function did not include a List test and he notes on Equal vs.SameQ : "The use of === rather than == makes SubsetQ return False if either set1 or set2 is not a list. Try it with Equal."

I use the Head test: set1_List, which rejects a non-List argument before evaluation. And more importantly, SameQ should always be used to test non-numerical equivalence. In fact using Equal here can give screwy results in part because of its usage as the mathematical "equals" (=) in Mathematica's syntax for equations.

When using abbreviated operators, we should first determine their Precedence:

In[235]:= Precedence/@{Union,Equal,SameQ}

Out[235]= {300.,290.,290.}

Since Union is slightly stickier than SameQ, the Union will be performed before the SameQ test.

In[251]:= subsetQ[set1_,set2_]:=set1∪set2===set2

In[252]:= subsetQ[{a,b,c},{a,b,d}]

Out[252]= False

In[254]:= subsetQ[{a,b},{a,b,2}]

Out[254]= False

Union Sorts Its Result

Whoops! Wagner's function didn't work. Let's find out why:

In[257]:= subsetQ[{a,b},{a,b,2}]//Trace

Out[257]= {subsetQ[{a,b},{a,b,2}],{a,b}∪{a,b,2}==={a,b,2},{{a,b}∪{a,b,2},{2,a,b}},{2,a,b}==={a,b,2},False}

The fact that Union returns a sorted List fouls up the comparison. Let's try a fix with Sort:

In[258]:= Clear@subsetQ;subsetQ[set1_,set2_]:=set1∪set2===Sort@set2

And it works - you see in the last step Sort fixes the order.

In[259]:= subsetQ[{a,b},{a,b,2}]//Trace

Out[259]= {subsetQ[{a,b},{a,b,2}],{a,b}∪{a,b,2}===Sort[{a,b,2}],{{a,b}∪{a,b,2},{2,a,b}},{Sort[{a,b,2}],{2,a,b}},{2,a,b}==={2,a,b},True}

Greater (>) is a "non-Q" predicate so the last statement (lacking a return-suppressing semi-colon), returns True or False.

Create Predicates to Test Strings vs Numerics: Test for File Extensions

Elsewhere I tell why predicates are important to use, show how to find all predicates in Mathematica including ones not suffixed with "Q"and create ones of your own to test numerical expressions. Here I cover predicates for testing Strings, and create some tests for file types, as judged by their extensions. Note that FileExtension does not capture the period before the extension.

Equal (==) and SameQ (===) are valid predicates for Strings as well as numerics, and work fine for simple applications.

In[206]:= uncusFileQ@fileName_String := FileExtension@fileName == "unc"

In[207]:= uncusFileQ@"afile.unc"

Out[207]= True

In[208]:= uncusFileQ@"afile.txt"

Out[208]= False

In[209]:= textFileQ@fileName_String := FileExtension@fileName == "txt"

In[215]:= textFileQ@"sampleFile.txt"

Out[215]= True

To test for two alternative file extensions is a little trickier. Since we are testing for String equivalence between String patterns, we use the general String predicate function StringMatchQ and Alternatives (|), not logical Or (||). An example is testing for HTML files, since their extensions come in .htm and .html flavors.

In[210]:= Clear@htmlFileQ;
htmlFileQ@fileName_String := StringMatchQ[FileExtension@fileName, "html" | "htm"]

In[212]:= htmlFileQ@"www.jognog.com/index.html"

Out[212]= True

In[213]:= htmlFileQ@"www.jognog.com/index.htm"

Out[213]= True

In[214]:= htmlFileQ@"www.jognog.com/sitemap.xml"

Out[214]= False


Friday, March 21, 2014

String Processing: Compose Last Name + First Initial and Create Email Address

String Processing: Compose First Name + First Initial and Create Email Addresses

I've imported a List of teacher's names. Now I need to create usernames and email addresses for them to upload to an app. Here is the List; note there are nested Lists within it.

In[226]:= teacherNames={{{"Bazan, Dorothy","Bleakney, Tom"},{"Calorio, Marie","Cardoso, Brian","Carpenito, Courtney","Carton, Jacqueline"},{"Coffey, Tara","Conaty, Thomas"}},{{"Davenport, Joseph","DiSabatino, Jennifer"},{"Doran, Robert","Dsida, Terri"},"Eramo, Paula","Needle, Julie",{"O'Connell, Marisa","O'Neil, Christopher"}}};

The nesting is just going to get in the way so let's remove extra braces with Flatten. Incidentally, you need not hesitate to build up deeply nested Lists in a function and then Flatten them all at once, which is very fast. To keep seeing these names as Strings, we Postfix InputForm to display the quotation marks.

In[227]:= teacherNamesFlattened=Flatten@teacherNames;teacherNamesFlattened//InputForm

Out[227]//InputForm=
{"Bazan, Dorothy", "Bleakney, Tom", "Calorio, Marie", "Cardoso, Brian",
 "Carpenito, Courtney", "Carton, Jacqueline", "Coffey, Tara", "Conaty, Thomas",
 "Davenport, Joseph", "DiSabatino, Jennifer", "Doran, Robert", "Dsida, Terri",
 "Eramo, Paula", "Needle, Julie", "O'Connell, Marisa", "O'Neil, Christopher"}

Let's lose the apostrophes. It's as simple as using a replacement Rule sending apostrophe to no character ("" ' "" -> ""). Without InputForm we don't see the quotation marks, but unless we use ToExpression to convert it, a String remains a String forever.

In[228]:= teacherNamesFlattenedCleaned=StringReplace[teacherNamesFlattened,"'"->""]
Out[228]= {Bazan, Dorothy,Bleakney, Tom,Calorio, Marie,Cardoso, Brian,Carpenito, Courtney,Carton, Jacqueline,Coffey, Tara,Conaty, Thomas,Davenport, Joseph,DiSabatino, Jennifer,Doran, Robert,Dsida, Terri,Eramo, Paula,Needle, Julie,OConnell, Marisa,ONeil, Christopher}

To compose usernames, we'll use a standard format, for instance, lastName + firstInitial. We use StringReplace, which takes one or more replacement Rules as arguments. We construct a pattern using deconstructed pieces of the first and last name. Note that named Blank patterns work in this type of String pattern matching.

Since I'm now going to perform several operations on the source List, for clarity, I will use Postfix with Function syntax and avoid deeply nested, hard-to-read code. The rationale behind this syntax is in A New Mathematica Programming Style Part 2..

This code works, but what is going wrong?

In[229]:= userNames=teacherNamesFlattenedCleaned//StringReplace[#,lastName__<>", "<>firstName__:>lastName<>First@Characters@firstName]&

During evaluation of In[229]:= StringJoin::string: String expected at position 1 in lastName__<>, <>firstName__. >>
During evaluation of In[229]:= StringJoin::string: String expected at position 3 in lastName__<>, <>firstName__. >>

Out[229]= {BazanD,BleakneyT,CalorioM,CardosoB,CarpenitoC,CartonJ,CoffeyT,ConatyT,DavenportJ,DiSabatinoJ,DoranR,DsidaT,EramoP,NeedleJ,OConnellM,ONeilC}

The error is due to using StringJoin ("<>") correctly to compose the deconstructed Strings but incorrectly to deconstruct them with patterns. Using the correct operator, StringExpression ("~~") solves the problem. Also let's convert the usernames to all lower case. Since ToLowerCase is Listable, we don't need to Map it over all the usernames.

In[230]:= userNames=teacherNamesFlattenedCleaned//StringReplace[#,lastName__~~", "~~firstName__:>lastName<>First@Characters@firstName]&//ToLowerCase

Out[230]= {bazand,bleakneyt,caloriom,cardosob,carpenitoc,cartonj,coffeyt,conatyt,davenportj,disabatinoj,doranr,dsidat,eramop,needlej,oconnellm,oneilc}

We can just as easily compose usernames with a reversed form, firstInitial + lastName.

In[231]:= userNames2=teacherNamesFlattenedCleaned//StringReplace[#,lastName__~~", "~~firstName__:>First@Characters@firstName<>lastName]&//ToLowerCase

Out[231]= {dbazan,tbleakney,mcalorio,bcardoso,ccarpenito,jcarton,tcoffey,tconaty,jdavenport,jdisabatino,rdoran,tdsida,peramo,jneedle,moconnell,coneil}

To compose email addresses, in a similar fashion, we'll use a template form, firstName.lastName@schoolDomain.org.

In[232]:= emailAddresses=teacherNamesFlattenedCleaned//StringReplace[#,lastName__~~", "~~firstName__:>firstName<>"."<>lastName<>"@hometownschools.org"]&//ToLowerCase

Out[232]= {dorothy.bazan@hometownschools.org,tom.bleakney@hometownschools.org,marie.calorio@hometownschools.org,brian.cardoso@hometownschools.org,courtney.carpenito@hometownschools.org,jacqueline.carton@hometownschools.org,tara.coffey@hometownschools.org,thomas.conaty@hometownschools.org,joseph.davenport@hometownschools.org,jennifer.disabatino@hometownschools.org,robert.doran@hometownschools.org,terri.dsida@hometownschools.org,paula.eramo@hometownschools.org,julie.needle@hometownschools.org,marisa.oconnell@hometownschools.org,christopher.oneil@hometownschools.org}

Now I'd like to make one List with both username and email address. Here is a typical use of Thread, where using the abbreviated operator for List, instead of Thread[List[userNames2,emailAddresses]] makes the code simple to understand.

In[233]:= namesAndEmails=Thread[{userNames2,emailAddresses}]

Out[233]= {{dbazan,dorothy.bazan@hometownschools.org},{tbleakney,tom.bleakney@hometownschools.org},{mcalorio,marie.calorio@hometownschools.org},{bcardoso,brian.cardoso@hometownschools.org},{ccarpenito,courtney.carpenito@hometownschools.org},{jcarton,jacqueline.carton@hometownschools.org},{tcoffey,tara.coffey@hometownschools.org},{tconaty,thomas.conaty@hometownschools.org},{jdavenport,joseph.davenport@hometownschools.org},{jdisabatino,jennifer.disabatino@hometownschools.org},{rdoran,robert.doran@hometownschools.org},{tdsida,terri.dsida@hometownschools.org},{peramo,paula.eramo@hometownschools.org},{jneedle,julie.needle@hometownschools.org},{moconnell,marisa.oconnell@hometownschools.org},{coneil,christopher.oneil@hometownschools.org}}

Finally I'll Export the List to a file in my Documents directory. It is safest to use FileNameJoin to compose a filename, since it will catch syntax errors as well as make sure it's in the correct format for your operating system.

In[234]:= Export[FileNameJoin@{$UserDocumentsDirectory,"HomeTownUserNames.xlsx"}, namesAndEmails]

Out[234]= C:\Users\kwcarlso\Documents\HomeTownUserNames.xlsx