Those of you who have been keeping dream journals for many years, how do you archive/tag/index your journals?
I’ve been keeping a dream journal for about 5 years now, and it’s just a mess. Mine are stored electronically, but if you have a strategy for paper/physical journals too that would be of interest. I started with one ‘note’ for a month of dreams, because they were not very detailed at first. This progressed to a note/entry per night. Then I swapped to another online note-keeping bit of software, so half are in one, half in another. It’s a gigantic mess that’s growing daily, and I have no ability to go back and locate particular dreams, lucids, or dream themes/characters.
Anyone got any good strategies?
Some kind of tagging system maybe? (Not very good at this organisation thing.)
I recently did a full backup of my dream journals here on LD4all.
I have 23 topics with 10 pages each. Fortunately, the moderator’s tools gives me a way to see all 10 pages on one screen. From there, it is easy to save the entire topic as a html file. My partner made a script to extract the actual text of the posts from the local copies. It works well, it even got the encoding bug fixed in the output.
This was so I could easily search through the text for how many times certain words were used. I can also easily search for a date, which can be hard to do with the current search tool here.
My old physical DJs are manually tagged with major dream signs. I have been thinking of transferring them into a digital version. Maybe that script will finally make a format that can fuse the pre-LD4all DJs with the ones from here.
What I do at the moment is have a file with a section for each dream (html markup), and each dream has a date. Searching for particular words can be done with the usual text search function. I also have indexes into it for where interesting features occur.
I think I would like to do more systematic tagging of entries, so that I could do search or statistical analysis by tag.
I have on my “to do” list to use a print on demand service to print a paperback book version of my dream journal. In principle, this just needs setting up some style templates for how I want it printed, and then sending off a PDF to be printed.
The dream art could do with being linked to the dream descriptions, too…
FiXato
(BeardGrabber's pappa, Dutch coder in Norway)
4
Actually, my work-in-progress tool also created the local copies. It doesn’t rely on the mod’s splitter tool. Theoretically using a dump of that could reduce the amount of page calls, but it would require moderation interaction, and would lack things like avatars.
Basically, what it does is:
Take a list of URLs
Send a HTTP POST request to log in, if the page requires authentication (using the Ruby gem (library) Faraday)
Send a HTTP GET request for the document
Store the result in a locally cached (HTML) file
Parse the result with Nokogiri (a Ruby gem for parsing HTML/XML documents)
Extract individual posts from that document using CSS selectors, as well as some details such as avatars (which it can also locally cache), post date, username, user profile link, original post link, etc, and store them in memory in a structured Hash variable.
Find the ‘Next’ link for the next page in the journal, and repeat the previous couple of steps till there is no more Next link.
Move onto the next DJ URL in the list
Go through the in-memory data to export the contents as a simple plain-text file, as well as a fairly simple, but well-structured HTML document.
As this tool relies on the HTML version of the viewtopic pages, there are some limitations:
The tool currently only supports LD4All (though I’m trying to make it modular enough to fairly easily add extra extraction classes), which isn’t the most structured HTML unfortunately, as it predates common usage of classes and ids that can easily be used to select specific elements. For instance, I currently rely on finding the userprofile link on the left side of the post, then finding the first table ancestor, and from there look up its first table data ancestor, and then its next (sibling) element, just to find the post container. I could probably look for the .postdetails instead, but this way allows me to filter out posts just by a specific (set of) user(s) with a CSS selector, rather than still having to traverse them all.
Since it relies on plain HTML, it doesn’t quite support the more Web 2.0 approach of loading content through javascript. If a page using an AJAX call to add extra content to the page, it currently wouldn’t see it. I’d have to implement something like PhantomJS or Watir to add support for that.
Also, since it relies on viewtopic, which only displays the processed HTML output of the post, you won’t get the source of your post, including the raw BBCodes used. This shouldn’t be a problem if you just want a local archive, but it would make it harder to import your posts into a different forum that also supports BBCode, as you’d have to write/use a HTML->BBCode converter.
Theoretically I could work-around this by extracting the original BBCode by querying each individual post via posting.php in quote mode, but that would be quite a lot of extra page requests, which I don’t want to do as I don’t want to stress the server.
I do already have support to reduce server load a bit by waiting a few seconds between each non-cache hit, but I’d still prefer not to stress webservers without their owners’ permission. I’d want to ask Qu for permission first, before implementing that.
Ideally a forum or site would provide a json or XML feed with just the bare minimum of data (topic id, topic title, topic starter, post subject, post username, post body (both in formatted HTML as well as source BBCode), user avatar, user signature, post date, post id), preferably with a parameter to adjust the amount of posts per page, to reduce the amount of queries needed. This however would require server-side work with access to the database, which usually isn’t an option. (And if you do have access to the database, you can just as well make a database dump )
The tagging is currently done in a very naive way, and by post-processing the HTML document the previous script generated. In the final version this tagging can probably be done at the same time as the HTML document is generated, to save some reprocessing resources.
As I said, the tagging is fairly naive atm; just simple string/css selector matches. For instance, if it includes the LD acronym tag, it assumes it’s a Lucid Dream, same with keyphrases such as ‘become lucid’ or ‘lose lucidity’. Unfortunately this leads to quite some false positives, but that feature was more of a proof of concept, and a way to quickly filter down posts to try and spot patterns.
I’m currently refactoring and cleaning up the code, but I’ll probably post the source code on Github once I’m done.
The online note service(s) I’ve used have an extract/backup to HTML, so your tool will probably work, with a few tweaks. I did have the habit of tagging lucids in blue text, so instead of searching for [LD] or BBCode, it can probably search for chosen HTML tags instead. I will have a play about with the backup/export functions and see what it dumps out.
Sounds awesome, cool project, FiXato!
@Susan_Y - It sounds like you manually write/update an index. What do you put in the index, if you don’t mind me asking? Brief description? Title? Tags?
@Siiw - Admittedly it was your post on colours in dreams that sparked ideas of better archiving and indexing of my DJs. Wow, 23 topics x 10 pages I need to get something sorted out, pronto!
FiXato
(BeardGrabber's pappa, Dutch coder in Norway)
6
Yeah, that shouldn’t be a problem. Searching for
.postbody span[style^="color: blue"]
CSS selector would do the trick. Example from searching from Ruby’s interactive console using Nokogiri:
irb(main):005:0> doc.css('.postbody span[style^="color: blue"]').first
=> #<Nokogiri::XML::Element:0x2ab9a45e819c name="span" attributes=[#<Nokogiri::XML::Attr:0x2ab9a45edc78 name="style" value="color: blue">] children=[#<Nokogiri::XML::Text:0x2ab9a45ec8dc "Pink lines show up in the air on the pictures, and I realise that it is a dream. We start walking through the forest, and I sing loudly. It lasts for only a few moments before I wake up.">]>
(Dreamfragment posted with permission)
You just have to be consistent in only using it for LD fragments, and not to colour some other part blue as well. Unless you can combine it with a keyword that always exists inside the blue text section.
FiXato
(BeardGrabber's pappa, Dutch coder in Norway)
7
Hmm, I just realised the print-version of topics could also be a good way to extract post contents, especially as you can also get the entire topic on a single page that way. The downside though is that posts don’t contain profile links, nor avatars, both of which imho are quite handy if you want to include comments from others in your archive.
For a pure DJ-only archive, it would definitely speed up the archive process though, with a minimal amount of server stress.
If profile links were included too, additional resources such as avatars could be uniquely indexed separately afterwards.
FiXato
(BeardGrabber's pappa, Dutch coder in Norway)
8
I guess I’d have to rewrite my tool for the new forum.
I had actually kinda forgot about the tool till now. I should clean up the code, and add support for this new one so I can actually publish the code.
Does the new forum already have some kind of personal backup option perhaps?
I used to keep a dream journal. I have a normal journal where I write poetry. I could never write dreams in those becuase they were so precious journals that i spent a lot of money on and i didnt want to mess them up. I thought my words about my dreams wouldn’t be good enough.