We illustrate several different ways we can separate the following string into substrings.
i1 : s = "This is an example of a string.\nIt contains some letters, spaces, and punctuation.\r\nIt also contains some new line characters.\r\nIn fact, for some reason, both Unix-style\nand Windows-style\r\nnew line characters are present." o1 = This is an example of a string. It contains some letters, spaces, and punctuation. It also contains some new line characters. In fact, for some reason, both Unix-style and Windows-style new line characters are present. |
The command separate(s) breaks s at every occurrence of "\r\n" or "\n".
i2 : separate(s) o2 = {This is an example of a string., It contains some letters, spaces, and punctuation., It also contains some new line ---------------------------------------------------------------------------------------------------------------------------- characters., In fact, for some reason, both Unix-style, and Windows-style, new line characters are present.} o2 : List |
This is equivalent to using the lines function.
i3 : lines s o3 = {This is an example of a string., It contains some letters, spaces, and punctuation., It also contains some new line ---------------------------------------------------------------------------------------------------------------------------- characters., In fact, for some reason, both Unix-style, and Windows-style, new line characters are present.} o3 : List |
Instead of breaking at new line characters, we can specify which character to break at. For instance, we can separate at every comma:
i4 : separate(",", s) o4 = {This is an example of a string., spaces, and punctuation. , for some reason, both Unix-style It contains some letters It also contains some new line characters. and Windows-style In fact new line characters ---------------------------------------------------------------------------------------------------------------------------- } are present. o4 : List |
or at every space:
i5 : separate(" ", s) o5 = {This, is, an, example, of, a, string., contains, some, letters,, spaces,, and, punctuation., also, contains, some, new, It It ---------------------------------------------------------------------------------------------------------------------------- line, characters., fact,, for, some, reason,, both, Unix-style, Windows-style, line, characters, are, present.} In and new o5 : List |
In the last two examples we can see line breaks appear in the output substrings, since we are no longer separating at them. (They are printed in the console as actual new lines, not using escape characters.)
Now let’s try breaking at the string "om". This occurs three times in our string (in three uses of the word "some"), so s is separated into four substrings. The separating characters "om" do not appear in any of the substrings.
i6 : t = separate("om", s) o6 = {This is an example of a string., e letters, spaces, and punctuation., e new line characters., It contains s It also contains s In fact, for s ---------------------------------------------------------------------------------------------------------------------------- e reason, both Unix-style } and Windows-style new line characters are present. o6 : List |
We can recover the original string using the demark function.
i7 : demark("om", t) o7 = This is an example of a string. It contains some letters, spaces, and punctuation. It also contains some new line characters. In fact, for some reason, both Unix-style and Windows-style new line characters are present. |
In general, s = demark(x, separate(x, s)). The exception to this rule is that demark("\n", separate(s)) isn’t necessarily equal to s; this code will replace any "\r\n" line breaks in s with "\n" characters.
To use a string longer than 2 characters to separate, and for much greater flexibility and control in specifying separation rules, see separateRegexp.