LJ LFs

Apr. 8th, 2017 03:48 pm
codeswitcher: A rainbow splash of paint exploding upward (Default)
[personal profile] codeswitcher
WTELF. I am trying to – yes, yes, I know – do regexes on dumps of the LJ Inbox, and I just discovered that something Very Weird is going on with the linefeeds.

Mostly everything is fine. Except the part that isn't: The actual message contents seem to have line breaks between, er, lines (at least everything that renders that thinks they're line ends – View Source in Firefox, more at the command line, BBEdit) except that perl isn't seeing them as line breaks. Perl thinks (okay, while (my $line = <INBOXFILE>) returns) they're all one big happy line with some sort of line break in them. Running against that string regex that matches just the HTML at the end ($line =~ /<div class=\'actions\'>/) doesn't just return the last line of the message, it returns almost the whole damned message.

To confirm this, I wrote a perl script to simply number the lines of the raw HTML. It returned something like the following (heavily redacted because privacy and not escaping all the damned angle brackets):
[...]

522:                 <td class="item">

523:                     <div class="InboxItem_Controls"><a href='http://www.livejournal.com/inbox/?page=1&bookmark_off=3320'><img src='../../l-stat.livejournal.net/img/flag_off.gif' width='16' height='18' class='InboxItem_Bookmark' border='0' /></a> 

524:                 <a href="http://www.livejournal.com/inbox/?page=1&expand=3320"><img src="../../l-stat.livejournal.net/img/expand.gif%3Fv=8234" class="InboxItem_Expand" border="0" onclick="return false" /></a>

525:                 </div>

526:                     <span class="InboxItem_Title InboxItem_Unread" id="all_Title_3320"><div class='pkg'><div style='width: 60px; float: left;'><img src="../../l-userpic.livejournal.com/19583817/961489" width="50" align="top" /></div><div>Re: (no subject)<br />from HTMLHTMLHTML...

527:                     

528:                 <div class="InboxItem_Content" style="display: block;">Yes, exactly that one!  Thank you so much.  blah blah blah it's an awesome post, and I'm so glad you helped me find it again.
<br />
<br />randomusername
<br />
<br />--- I wrote:
<br />> This one? URLgoeshere
<br />> <br />> --- randomuername wrote:
<br />> > Sorry for bothering you, but was it you that wrote the awesome post about the blah blah blah?  I can't find it despite googling a lot, so I thought I'd ask... <div class='actions'> <a href='http://www.livejournal.com/inbox/compose.bml?mode=reply&msgid=80595787'>Reply</a> | <a href='http://www.livejournal.com/friends/add.bml?user=randomusername'>Add as friend</a> | <a href='http://www.livejournal.com/inbox/markspam.bml?msgid=80595787' class='mark-spam'>Mark as Spam</a></div></div>

529:                 

530:                     </td>

[...]

What I think should happen is that only the "line" that starts <br />> > Sorry for bothering you etc should be returned by matching on the attribute of that tag at the end. What I'm getting is everything here marked 528 and in bold.

As always, the questions are:

1) Why is it doing this to me? (Is there something actually different about those linefeeds? My text editor seems to think they're identical.)
2) What is doing this to me? (Is this perl being weird?)
3) How do I make it stop? (Can I convince perl not to do what it's doing and do something that make more sense?)

Advice welcome. Meanwhile, I'm necessarily just going to live with it. Don't have time to wrestle this alligator on my way across the swamp.
codeswitcher: A rainbow splash of paint exploding upward (Default)
[personal profile] codeswitcher
I wrote some scripts to automate the granting of access filter privileges in Dreamwidth, for recent importees from Livejournal.

They're in perl, and use wget, and are for the command line, and require that you figure out how to come up with a cookie file.

Instructions in the README.md

They are here on Github.

There are many improvements that could be made, but I think I should move on to trying to address some of the other outstanding importation problems.

I so clearly have no idea how to use github. Suggestions of every sort are welcome, including just which license I should release this under.
[identity profile] codeswitcher.livejournal.com
Enabling plus addressing in Cpanel should be easy, what with all the layers of email filtering it presents the domain owner. The way to enable plus addressing is to filter incoming email on the basis of to whom its addressed. If you're fred@domain.tld, and want plus addressing to work, all forms of fred+arbitrarystring@domain.tld need to get routed to fred@domain.tld. That's a job for email filtering.

Alas, Cpanel presents the user with no way to filter incoming email on the basis of who it's really addressed for, i.e. the recipient specified in the envelope of the message. This is the address you really want for filtering, because without it, BCC and email list messages aren't delivered when you try to filter on the recipient. To add insult to injury, a "Envelope-to" field is added to messages at delivery -- after all filtering is done.

Cpanel is exim under the hood, and, frankly, the best thing about cpanel+exim is that it allows you to call out to some other real filtering program, if available.

And procmail, which is such a real filtering program, is on many linuxes by default. And it, in turn, can call perl so you can really get something accomplished.

So the answer to "How can I filter by the Envelope-To in cpanel+exim?" is "You can't. But you can call something that can."

Instructions. Cut for epic length. )

Profile

cs_hackerary: (Default)
Codeswitcher's Hackerary

January 2019

S M T W T F S
  1234 5
6789101112
13141516171819
20212223242526
2728293031  

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Apr. 19th, 2019 02:29 am
Powered by Dreamwidth Studios