Generating Mind Maps from OU/OpenLearn Structured Authoring XML Documents

One of the really useful things about publishing documents in a structured way is that we can treat the document as a database, or generate an outline view of it automatically.

Whilst looking through the OU Structured Authoring XML docs looking for things I could reliably extract from them in order to configure a course custom search engine (Notes on Custom Course Search Engines Derived from OU Structured Authoring Documents), I put together a quick script to generate a course mind map based around the course structure.

It struck me that as structured document/XML views of OpenLearn material is available, I could do the same for OpenLearn docs. So here’s an example. If you visit the OpenLearn site, you should be able to find several modules derived from the old OU course T175. Going to the first page proper for each of the derived modules (URLs have the form http://openlearn.open.ac.uk/mod/oucontent/view.php?id=398868&direct=1), it is possible to grab a copy of the source XML document for the unit by rewriting the URL to include the setting&content=1: for example, http://openlearn.open.ac.uk/mod/oucontent/view.php?id=398868&content=1

Downloading the XML files for each of the T175 derived modules on OpenLearn into a single folder, I put together a quick script to mine the structure of the document and pull out the learning objectives for each unit, as well as the headings of each section and subsection. The resulting mindmap provides an outline of the course as a whole, something that can be used to provide a macroscopic view over the whole course, as well as providing a document that could be made available to people following the unit as a resource they could use to organise their notes or annotations around the unit.

Download a copy of the T175 on OpenLearn Outline Freemind/.mm mindmap

If we could find a way of getting the OpenLearn page URLs for each section, we could add them in as links within the mindmap, thus allowing it to be used as a navigation surface. (See also MindMap Navigation for Online Courses in this regard.)

Here’s a copy of the Python script I ran over the folder to generate the Freemind mindmap definition file (filetype .mm) based on the section and subsection elements used to structure the document.

# DEPENDENCIES
## We're going to load files in from a course related directory
import os
## Quick hack approach - use lxml parser to parse SA XML files
from lxml import etree
# We may find it handy to generate timestamps...
import time


# CONFIGURATION

## The directory the course XML files are in (separate directory for each course for now) 
SA_XMLfiledir='data'
## We can get copies of the XML versions of Structured Authoring documents
## that are rendered in the VLE by adding &content=1 to the end of the URL
## [via Colin Chambers]
## eg http://learn.open.ac.uk/mod/oucontent/view.php?id=526433&content=1


# UTILITIES

#lxml flatten routine - grab text from across subelements
#via http://stackoverflow.com/questions/5757201/help-or-advice-me-get-started-with-lxml/5899005#5899005
def flatten(el):           
    result = [ (el.text or "") ]
    for sel in el:
        result.append(flatten(sel))
        result.append(sel.tail or "")
    return "".join(result)

#Quick and dirty handler for saving XML trees as files
def xmlFileSave(fn,xml):
	# Output
	txt = etree.tostring(xml, pretty_print=True)
	#print txt
	fout=open(fn,'wb+')
	#fout.write('<?xml version="1.0" encoding="UTF-8" ?>\n')
	fout.write(txt)
	fout.close()


#GENERATE A FREEMIND MINDMAP FROM A SINGLE T151 SA DOCUMENT
## The structure of the T151 course lends itself to a mindmap/tree style visualisation
## Essentially what we are doing here is recreating an outline view of the course that was originally used in the course design phase
def freemindRoot(page):
	tree = etree.parse('/'.join([SA_XMLfiledir,page]))
	courseRoot = tree.getroot()
	mm=etree.Element("map")
	mm.set("version", "0.9.0")
	root=etree.SubElement(mm,"node")
	root.set("CREATED",str(int(time.time())))
	root.set("STYLE","fork")
	#We probably need to bear in mind escaping the text strings?
	#courseRoot: The course title is not represented consistently in the T151 SA docs, so we need to flatten it
	title=flatten(courseRoot.find('CourseTitle'))
	root.set("TEXT",title)
	
	## Grab a listing of the SA files in the target directory
	listing = os.listdir(SA_XMLfiledir)

	#For each SA doc, we need to handle it separately
	for page in listing:
		print 'Page',page
		#Week 0 and Week 10 are special cases and don't follow the standard teaching week layout
		if page!='week0.xml' and page!='week10.xml':
			tree = etree.parse('/'.join([SA_XMLfiledir,page]))
			courseRoot = tree.getroot()
			parsePage(courseRoot,root)
	return mm

def learningOutcomes(courseRoot,root):
	mmlos=etree.SubElement(root,"node")
	mmlos.set("TEXT","Learning Outcomes")
	mmlos.set("FOLDED","true")
	
	los=courseRoot.findall('.//FrontMatter/LearningOutcomes/LearningOutcome')
	for lo in los:
		mmsession=etree.SubElement(mmlos,"node")
		mmsession.set("TEXT",flatten(lo))

def parsePage(courseRoot,root):
	unitTitle=courseRoot.find('.//Unit/UnitTitle')

	mmweek=etree.SubElement(root,"node")
	mmweek.set("TEXT",flatten(unitTitle))
	mmweek.set("FOLDED","true")

	learningOutcomes(courseRoot,mmweek)
	
	sessions=courseRoot.findall('.//Unit/Session')
	for session in sessions:
		title=flatten(session.find('.//Title'))
		mmsession=etree.SubElement(mmweek,"node")
		mmsession.set("TEXT",title)
		mmsession.set("FOLDED","true")
		subsessions=session.findall('.//Section')
		for subsession in subsessions:
			heading=subsession.find('.//Title')
			if heading !=None:
				title=flatten(heading)
				mmsubsession=etree.SubElement(mmsession,"node")
				mmsubsession.set("TEXT",title)
				mmsubsession.set("FOLDED","true")


mm=freemindRoot('t175_1.xml')
print etree.tostring(mm, pretty_print=True)
xmlFileSave('reports/test_t175_full.mm',mm)

If you try to run it over other OpenLearn materials, you may need to tweak the parser slightly. For example, some documents may make use of InnerSection elements, or Header rather than Title elements.

If youdo try using the above script to generate mindmaps/outlines of other OpenLearn courses, please let me know how you got on in the comments below (eg whether you needed to tweak the script, or whether you found other structural elements that could be pulled into the mindmap.)

Latest Images

Trending Articles

Latest Images