Alright. Welcome back to Computer Science S75. So tonight's on MVC and XML in the context of PHP, and we'll also talk a bit tonight about the first project Zero whose PDF Spec is now online, and we'll also talk about the CS50 appliance and about the Linux and about a lamp stack more generally. So, what was MVC from last week? What was the purpose served by MVC? What does MVC stand for? I'll take anything. Yeah. Connor. Model View Controller. Sorry. Model View Controller. Okay. Model View Controller. Good. So someone else. What does that mean? And what was the point of introducing this acronym? Yeah. It means dividing all your files up into different stacks and layers... Okay. and getting things you use for example more often, single source and then using it in multiple views. Okay. Good. So reusing of code is one of the motivations for MVC. Views recall had to do with the display of information, and so you could share code among your views, particularly for headers and for footers, and we didn't really dive into much detail yet about controllers, or models, even, so today we'll look a little more at the first two, views and controllers so that you have some possible design patterns in mind when you tackle project Zero. And also in tonight's section, which will focus predominantly on project Zero, you'll be given some direction and some tips on what to do for that project. So just to set the stage, project Zero is about pizza ML, Pizza Markup Language. And the motivation for this is as follows. Some years ago there was a favorite pizzeria of ours down the road, a shop called Three Aces, and unfortunately, closed its shop, but we hung on to its menu because it's a wonderful example of a messy data set. And the goal for this project is to actually implement your own ecommerce like website for this pizzeria with which customers can order pizzas and subs and salads and the like, but the fellow who owned this shop we've hypothesized wasn't necessarily very technical and doesn't really need the overkill of a full-fledged database like MySQL or certainly not Oracle or the like, and in fact in terms of updating the menu, he is fine with something simple. Even using something simple like text edit on a MAC or Notepad.exe on a PC. He also doesn't expect a huge amount of traffic, too, so he wants to just be able to run this site off of an old server that they were given under the desk somewhere. So in short, resources are few, the load is low, and so the motivation then at hand is to come up with a database design of sorts that can nonetheless support a pizza ordering website. So tonight we'll talk about XML. XML is a text file format similar in spirit to HTML but at the risk of oversimplifying, it's like a make-your-own HTML type language, where you create your own tags. You create your own attributes, so this will be an opportunity to design your own representation of this pizzeria's menu and you'll have to implement in PHP a system whereby users can visit this site and can navigate the menu and can add things to their cart, change quantities, delete things, and ultimately "check out." And all of this will require some maintenance of state, so recall last week our discussion of the session object in PHP and cookies more generally. So being able to store things from page to page will be crucial for this particular challenge at hand. So more on that toward the end of tonight's, but keep in mind that the goal will indeed by this pizzeria. So what's the context in which we're going to do this? So the CS50 appliance is just a virtual machine. Per the announcement on the course's home page, we're going to post it very late tonight or by tomorrow morning once we finish adding a few final touches to what's going to be called version 17A and the directions for this will be on the course's website. In the meantime, on the projects page on the course's website, is the specification for the, for the project. So you can dive in tonight and think through it, read through it, and then tomorrow you'll be able to download this and dive in hands on. So the appliance is a installation of a Linux flavor called Fedora. For those unfamiliar, Linux exists in many different forms, Ubuntu, in Debian, in Red Hat, in Centos, and all sorts of others. We happen to be using Fedora, which is an RPM based system, but that's mostly irrelevant for our purposes because what really matters about the appliance is that it has LAMP, and PHP, and Apache and MySQL all preinstalled and configured for you but still you'll have root access with which to tinker with the configuration files we've talked about thus, httb.conf, php.ini, and the like, and you'll find that it's a very natural real world environment so that even though it's running on your own laptop or desktop it's representative of most any commercial host or VPS, virtual private server, that you yourself might subscribe to. So you'll be able to experience exactly what you would if you were actually paying for some third-party hosting. So, we'll do some more demos and I'll walk you through some commands for those unfamiliar over the course of today. So here's a better picture than I certainly drew with my ASCII art last week, of MVC. And the idea ultimately between MVC is that you have the brains of an operation known as a controller, and it might be one controller, it might be multiple controllers, and the controller talks to things like a model. Now, for now, tonight, we'll largely wave our hands at the model, but we'll come back to this in the context of databases in particular, but for now you can think of the model as being even for something like your XML file, so more on that to come. And then Views is what handles the actual aesthetics. So once you have, for instance, come up with the data that you want to show to the user, typically you pass it as one or more arguments to this thing called a view, and the view is then rendered with that dynamically provided data. And we'll see a couple more code examples about this earlier, but really the key takeaway here is the directions of these lines. There is no lateral line left to write and the ideas will become clear is that you're models, which again refer to databases, real world entities, student objects, pizza objects, order objects and the like, should not have any direct communication with the display of that information, it's really the controller that should query a database, or even an XML file, get some data, store it in variables, then pass those variables to the views for actual aesthetic rendering. And so we'll see again some concrete examples here tonight, and we'll revisit this topology over time. But first some preliminaries. How many people here have used source control before, if you know what the expression means? Okay. So a little bit of this, a little over here. Okay, so a couple of you. We won't dwell too much on this in lecture because the project specification itself has some of the familiar commands, but a source control is generally about keeping backups of your files in the form of differentials, changes from one version to another. If I put the question this way, how many of you when writing essays, writing resumes, writing any kinds of documents that you want to make changes to, just so that you have copies of the old version, change your file from something.doc to something-hold.doc., or something-july1.doc, or something-july2.doc, right. We all have probably adopted our own sort of versioning schemes just by making copies of files and renaming them. But what's the downside of this approach? Yeah. [Inaudible] rename it [inaudible]. Yeah. If you rename the file, you lost some of its history and you can sort of figure out what change you made between version 3 and version 4 by opening them both up side by side, but you the human have to kind of figure out what changes did I make. If you're a bit more advanced with Microsoft word, realize that it has a track changes feature which allows you to keep one file, but record at least your most recent changes history. That's similar in spirit to source control, but if instead you've been doing it the old school, sort of intuitive approach of just recopying and renaming files, well we can do better than this, particularly for large files. Because if you're making a copy of a file just so that you keep a backup of it around with a different name, you're literally duplicating all of the data between those files, which might not be a big deal for essays and whatnot, but if it's a video file or if it's simply a very large essay with lots of diagrams or pictures, you could start churning through kilobytes or megabytes. So we can do better. So in the world of programming there's this general notion of source control, which refers to the ability to write a program, for instance, Hello.php, and then if you want to make a change to it, you can first commit the current version of your code to what's called a source code repository and that repository is what essentially stores all of the backups that you yourself have decided to commit. You never change the name, your file always stays as Hello.PHP, but you can roll back in time through certain commands to say, actually give me the version from yesterday, or two days ago, or a month ago, and the source control software that you're using figures out how to do all of this. You don't do any of the renamings yourself. So, the way, beyond the motivation for this being just to clean up one's backup systems, it's also very compelling for people to collaborate on code. So even though for this course you'll be working solo on your own on the projects, it's not unreasonable to expect that you're eventually going to work with a friend on some project, a colleague on some project, and so what version control also allows you to do is this. In general, this right hand line here, the black one, represents what we'll call a master branch. A master branch is like the default repository into which everyone saves their code on the team. But you don't want decide some night that, "Oh, I'm, I'm going to want to try implement some brand new feature for my code, and you don't though think that it's just going to take a few minutes. It could take an hour, could take a day, could take a week, a month. So you don't necessarily want to start writing changes and then pushing those changes to the repository if your changes are only partially complete. Right if your code is broken, it's not going to really hurt you but you don't want to break the entire project for everyone else. So what source control allows you to do is to fork off what's called a branch. So on the left in yellow here is what we might call a development branch and any developer, yourself included, can create a separate branch, which you can think of as a copy of the main repository, but you can make all of your changes, you can push changes to the server, but they stay separate, a separate branch of the tree so to speak, and only once you have decided you know what, I've finished this feature, I've bug, I've correctness tested and everything is good, then you want to merge it back into the so-called master branch, then you can actually do exactly that. Emerge from yellow to blue at which point everyone else has access to your feature. So in short, not so applicable, this branching for our purposes in the course since again you'll be working solo, but that's yet another feature of this. So source code controls incredibly common in the software world these days, and perhaps one of the most popular tools is something called GET these days. Now for those familiar, Visual Studio in the Windows and World has source control built in and it just comes with visual studio. There are other tools out there like CVS and RCS and SVN which you might have heard before. All of those are rather dated these days and rather what's gotten more popular are things like get, and mercurial, and bazaar, these are all tools that are distributed source code tools, and what this simply means is that you have not only a central server to save or to push all of your changes too, you also have a local copy in your own directory. So it means you can save different versions of your code in your own laptop or desktop without pushing them to the server or you can also push them to the server. This is contrast to things like SVN which was perhaps, up until recently, one of the most popular tools for version control. The problem with SVN is that there's only a central repository. If you want to say, I want to make a check point here of sorts, and I want to save this version of my code, if you're sitting on the train or you're in your in Starbucks with no internet access, you can't do that, because there's only a server. So these distributed tools like, Git, Mercurial, Bazaar, are definitely the way to go these days since you have the best of both worlds. So here's a quick review of commands that will be repeated in Project Zero's spec and we're trying to introduce this really for your own benefit so that, frankly, in the middle of the night, when you're chasing down some bug or some trying and implement some new feature, and you realize, damn that was a really bad idea, I just broke my entire code. All you have to do is run a command like git-checkout, and you'll roll back to the previous version without having to undo manually all of the changes you've made. But the general workflow is as follows, either some, a code repository already exists in the world and you so to speak clone it and then you've got a copy of it and then you can proceed to do your work there. For our purposes in the course, you're not going to clone an existing repository, because your project doesn't exist yet. Rather, you're going to create an empty directory as I'll start to do tonight. You're going to run a command like git-init, which will initialize that directory as being for a source code repository, then you'll type git-add, which will add files to that repository. And then you'll run git-commit, which saves that checkpoint so to speak. And this do this once ourselves. And then git-push meanwhile is a little different. Git-push-- where do you think it pushes the code to, if not your local repository? Yeah. Yep. Exactly. The server. So where is the server? We'll come back to that in just a moment. Git-pull is the opposite of that, so if you are using a central server, you can pull changes down, which is useful if you own multiple computers and you want to work on this laptop, but then the next day at work, you want to work on another computer but you don't want to have to put your code on a USB drive or something silly like that. You can just run git-pull on the second computer and that will copy down the changes you pushed from the first computer and git-branch refers to this idea of branching that we discussed via the picture a moment ago. So where is this central server? So we'll encourage you to do in the spec is to actually sign up for an account on this free service. So bitbucket is a freely available service that allows you to create git-repositories, both public and private. Public generally refers to open source software that you want to make available to the world, to the biggest example of this type of site is githubb, if you're familiar with them by name or by practice. What's nice though about bitbucket is that even though they're not as popular, they offer unlimited free, unlimited free and private repositories to people associated with the university, whereas githubb charges for private repositories, at least for a non-trivial number. So we'll encourage you to do this so that you have a central place to store your code not only for the semester, but really for posterity, but particularly since it'll let keep your intellectual property private. But more on that in the, in the project's specification. Any questions? So if I were to ask what is, what are two upsides of using source control? Sort of quiz like; can you give me one or two of them? Yeah. If you break something you can roll back to something that you had before. Good. So if you break something, you can roll back to a version you had before, or And all renaming is done automatically for you. Good so all the renaming are all of what would be the renamings of files is done for you sort of automatically so that the file names never change, and how about something else, even though it's not as applicable to the course? You can store things on a, on a server, so if your laptop breaks, you're still going to have [inaudible]. Oh, so that's another one. So the ability to push to a central server means if your laptop breaks or you're just elsewhere on a different computer, you still have access. This is good. We're breaking two. Yeah. Easy collaboration. Yeah. Easy collaboration, as well. So especially after the course if you dive into some project and you want to work with a buddy on it, this is the way to do it, whether with a public or private central repository using a service like bitbucket, githubb, or the like. Alright. So excellent. More on that in the spec itself. So this notion of MVC. How did we get to this point? Well let me go ahead and pull up one of last week's examples where we left off and go to Lecture two source, which gave us these examples here. Oh, so actually, let me take a step back, since I'm now inside of the environment of the appliance. So the [inaudible] of the appliance, again is this virtual machine. It's an installation of Linux that you can run on your own MAC, on your own PC, or even on your own Linux computer if that's what you're running. And what I simply did on my MAC is I opened up a program a moment ago called VMWare Fusion. VMWare Fusion is a hyper visor, and a hyper visor is a program that lets you play, or that is run virtual machines like the appliance. For the course, we have site licenses for VMWare Fusion if you are a MAC user and they're instruction in this, on the course's website on how to access this when you download the appliance itself. For PC users, Windows or Linux, there exists a free version of this same tool called VMWare Player, which will allow you to run the CS50 appliance and all sort of other appliances that people make available as well. So I ran Fusion, I imported the appliance by going to file, open, essentially, again more on that in the actual instructions. And then I double-clicked it and Linux booted up. And I simply have full-screened here so that we can see everything within the scope of just one window and it looks a little familiar to windows, or even MAC OS. We've got a menu here at the bottom. We've got some icons to common programs here, among which is a terminal window. Terminal window, we're going to start getting acquainted with all the more, and this is sort of your old school style DOS prompt command prompt where you can type individual commands, but this is very much where you would live and typically in the web world when setting up a server or uploading files when you don't have, for instances, a gooey that some web posting company provides you with. And there's also a text editor, chrome, and more down there. So let me go ahead and do this. Let me go ahead and open up a terminal window by clicking the little black icon, and that gives me a window that looks like this here and you'll notice that my name and your name soon is just John Harvard, and I am logged into the appliance and in parenthesis there is a tilde. And those familiar with Linux, what does the tilde represent? Okay. Connor. Axle. It could be the home of [inaudible]. Exactly, the home directory so to speak. So every user on the Linux computer, and also MAC OS and Windows these days, has a home directory and the home directory is just where you put your stuff. On a MAC, it's called your user name. On Windows, it's also called your user name, and it's usually in slash users on a MAC or in c:documents and settings on a Windows computer and the like, same idea. So inside of this appliance, we have our own home directory, inside of which might be zero or more files and folders. So I'm going to go ahead and type a quick command here to get us up and running on the web. But first, let me do this. Let me open up chrome inside of the appliance and Google comes up here by default. But instead I want to go to http://localhost just like last week. And what does localhost refer to exactly? What does local host refer to? Yeah. Whatever is actually on [inaudible]. Exactly, whatever is actually on the server. So I'm inside of the appliance right now, so localhost is as the name implies, the local host, the local computer. And I go ahead and hit enter here, and recall that we did this last time, hello world, which I tossed into that default, excuse me, directory known as VAR.WWW/HTML. That's simply where many servers' files go by default, but this wasn't much of interest, it was just an index dot html page. So let's instead now navigate our way to backslash, tilde, jharvard also like we last week and hit enter. But this time, this week, I've undone a lot of the changes we made last week so that we can start fresh and walk through the process of getting this set up. So this is bad. J. Harvard was backslash tilde jharvard was not found on the server. So John Harvard has no website. So what directory is probably missing in this case that would explain why John Harvard has not website? Yeah. Public html. Yeah. So if I do an LS here, notice that I have a few directories in my home directory by default, one of which I downloaded in advance. I've got desktop and downloads just like a MAC or a PC these days. Logs, which happen to refer to web server logs. And source, which I downloaded in advance for lectures and precreated lecture examples. But I'm obviously missing public html. So let's create that. I can do this in a couple of ways. The sort of real technical way to do it is mkdirpublic_html. This is just a convention that many, many, many apache web servers for the default directory for a user's homepage. You can change this to anything you want if you edit httpd.conf, but this is what again is the typical default. So we'll play along, and I'm going to go ahead and hit enter. Now just to be clear, let me go ahead and minimize my terminal window for just a moment. If you're more comfortable in a gooey, realize I could just do this. I could double-click on home and that would open up a more familiar file browser. Then I can do something like file, create folder, and I could have done something like this, create folder, public html, but you should get comfortable with the command lines since ultimately it'll be much more powerful and where you sometimes need to be on a web server to configure things. So now if I type LS for list, I see public HTML. So now, how do I open public HTML or navigate into? Yeah. [Inaudible] Okay. Sure, close. Change directory. So CD, public HTML, and you'll be able to type faster than you think thanks to keyboard shortcuts. So if I start typing PUB and then I get a little bored typing it out, you can often hit tab to do what's called tab completion, which will just save you time on a Linux computer. Then I'm going to do LS for list. There's obviously nothing in there. So now I'm going to go ahead and type genieindex.html and this going to open up a text editor that can be used for all sorts of languages, one of which is html. And let me go ahead in a tab that opens here do doc type, html, html, and notice it's going to do some tab, tag completion for me. Head tag. Let me go ahead and do a title tag, now, inside of that, "Hello World." Now let me go down here and do a body and Hello World. Save this, and now let me go ahead and close this. And now notice, once I'm back at my command prompt, if I do LS, I see index.html. So let's see what the result here is. Now let me go back to my browser. Let me go back to the same URL, tilde jharvard. Do I need to do this, woops, this? No. So no, because again, because of a setting in httpd.conf, the default is assumed to be index.html, or if you're running PHP, it will also look for index.php, as well as a couple other file names. So now I hit enter. Damn. I did something wrong. So first of all, what is the http status code in question here? Yeah. Inaudible. Four oh three. It's barely visible in chrome in these days. But if you look at the tab up top, you see 403; 403 means forbidden, which has something to do with file permission, so I screwed up somewhere. So let me go back to my prompt, and how do I go about diagnosing this. Let's not solve it yet. What did we do last week to diagnose this. LSAL. Good. LS-AL and this will show us a couple things. A, it will show us all files, including what are called dot files, which are simply files who's name start with a period, and those are just generally hidden by default. There not really hidden, it's just by default LS won't show them to you, so it's a common convention. And now here we have on a left, because I did L a long listing. So besides seeing Index.html and dot, and dot, dot, I see the permissions for each of those directories. What are the permissions for index.html right now implying? Yeah. Only you are allowed to read and write [inaudible]. Exactly. Only I am allowed to read and write the file right now because you only see an RW on the very left, and recall that the left most R and W in X if present refer to the owner. The next triple, R or W or X, refers to the group, which in my case is going to be students because John Harvard's supposed to be a student. And then the last three refer to whom. Other. Other. So everyone else in the world, which is pretty relevant in the world of the web because you're obviously trying to get other people on the internet, in this case, even yourself on a browser to view those files. So we need to give more than just me readability to this file. Not write-ability, but readability. So does anyone recall the command for doing this last time? Yeah. Chmod [inaudible]. Good. Chmod for change mode, A plus R for file name, followed by the file name, enter, LS-L now shows me indeed an R for everyone, including my group. My group's not really that relevant, but my group is a subset of the whole world, so that's fine, as well. Now, because you'll see it online, let's do this slightly more technically. Let me undo that and say A minus R and now do dash L, and now actually, now I really screwed up. Now even I can't view it. So Chmod U plus R for index.html, and now I have my readability back. U for user. So let's do this a little more technically so that you've seen it before. I'm going to go ahead and open up a little scratch pad here, text edit, just so we have a place to see this and let me go ahead and do this. So we just did A plus R, and I also did, just now U plus R, and so forth. This is a nice convention. But it would seem to allow you to toggle just read or writeability on a given group, yourself, your actual group, or the world. We can actually be a little more powerful here. So let me actually go back to a listing that looks like this. DRW-- - - - - - . So, the D denotes what, typically? Directory. So I'm going to get rid of that now that we've seen because we're talking about a file. So this is what we saw a moment ago when we got forbidden. Now the first three hyphens that follow are for the owner, the next three are group, and the last three are everyone. Alright, so that's just a recap of where we started. But it turns out that I could have accomplished the same thing with Chmod 644 index.html; 644. Well where does it come from? Well, turns out that there's a little cheat sheet here. Four typically denotes R, 2 denotes right, and 1 denotes X, for executability. So these are what are actually called octal numbers for the following reason. If you add these things up four and 2, obviously what do you get? Nothing technical about this. You get 6. So why does 644 represent the exact same string that we created earlier with the shortcuts, like A plus R and U plus R and the like. Well, 644 refers to first; this 6 maps to these three hyphens, this 4 maps to the next three hyphens, and the last 4 maps to the last three hyphens, and so 4 means readability for everyone, 4 means readability for my group, and 6 means readability and writeability for me. And it's octal [inaudible] because you can count up as high as 7. If you add all of these number together, 4 plus 2 plus 1, and the lowest number you can count from is zero, so we essentially have a range of zero to 8, a range of 8 total values. Okay. And actually, you want a really geeky aside? Why does this actually work? These hyphens are actually placeholders for bits. So in fact if you want to give someone writeability, the whole world rather, readability, you would do this. You would set that to 1. If you want to also let the whole world write your file, you would do this, and then you would leave this. Rather, let's say you just want to give the whole world readability. Well what's this number in binary, if you convert it to decimal? Four. Right? So this is the one's place, this is the two's place, this is the four's place, if you know binary. So this represents 4, which is why readability is represented by 4, writeability is represented by 2, and executability is represented by 1. Okay, random technical aside that you need, don't need to understand at a low level, but hopefully now just with these numbers, you can express any set of permissions that you actually want to. So let's actually summarize with a quick cheat sheet now. So for my, for my html file, I wanted 644. However, 604 would work as well, but generally just keeping the last two the same is fine. What other types of files would you probably want to be 644. In other words, world readable, but also writeable by you? What kinds of files in the context of the web? Surely there's something else. Yeah. [Inaudible] PHP, so good thought, but we actually don't need our PHP files to be world readable because only the web server needs to be able to read our PHP files in order to interpret them, after which it will spit out world readable html. So in fact for security, and recall our discussion a couple weeks ago about SU PHP, substitute user PHP. In the interest of security, PHP files should really be on a well-designed system, 600, so that only you can read and write them. But that's fine, because recall that a, a piece of software like SU PHP, runs as you, so the implication is that 600 is fine. So by contrast, what kinds of files also need to be 644? Yeah. [ Inaudible ] Good. So all the other stuff, so 644 for this, CSS, 644 for this, java script file, 644 for this, ping, 644 for this, what about a directory? What should a directory be? Yeah. [ Inaudible ] Okay. So good thought. So readable by the world but not writeable, so something like 644, so readable. So it turns out this doesn't work for a directory. So directories are sort of conceptually different from files certainly. And with directories, in order to do something with them, there's two permissions involved. One is readability. Can you see what's inside of it? So that's reasonable to want. But the other is executability. Can you even physically open it by double-clicking on it, by typing CD. So it turns out that executability in the context of directories is necessary if you want to let the world into that directory. So I would actually refine our definition here and besides just giving everyone executability, which we might be tempted to do by just adding 1 to each of these numbers, because recall that 1 represents executability. I'm going to propose that you know what, the world needs to be able to get into a directory, like a subdirectory in my web server, but they don't need to be able to poke around and see all of the files. If they know the name of the file, and it's readable, fine. But I don't want to give them what's called the directory listing, which shows them all of the files in a given folder, and for that would do 755 to not do that you would do 711. So in general, this cheat sheet that we just made here is pretty, this is a pretty exhaustive list of the relevant file formats that we'll be using for something like Project Zero or One or Two, and as well as for directories. There's exceptions, but this is a pretty good rule of thumb. Alright. So back then to where we left off here with MVC. So last time; let me go ahead and unzip some files that I just downloaded. So last time recall, we had these directories. We had [inaudible] examples, we had some login examples, and we had MVC, and recall where we started with MVC. Let me go ahead and open up index.php for MVC version zero, and notice that this was simple, but there's no programming logic whatsoever. Even though this is called index.php, there's no PHP code in it, and that's fine, but it's just raw HTML. But recall that this website; let me go ahead and pull this up. Let me go into jharvard. And here's the difference. So now we're in lecture and frankly I'd kind of like be able to browse the directory so I don't have to guess or remember the names of all of the files we've created. So let me go back here to my public HTML directory and do LSL-L here and let me make public HTML 755, because when I do that, now watch when I reload-- whoops. Now I'm getting an error with index.html. Let's actually get rid of index.html and reload this. Now I get the directory listing. That's 755. What we saw a moment ago was not. That was 711, or 700. So now I'm in MVC, now I'm in version zero, and voila. This was the incredibly underwhelming website we started with last week using just HTML. But what was the problem. Well if I click on lectures, this leads me to a file called lectures.php and if I look at that file at a prompt-- let me go ahead and do that. MVC zero and let me go ahead and open up lectures.php. Notice that this file looks awfully familiar, right. It's a little small, so let me zoom in. There's no PHP content in here either. It's pretty much a copy and paste, but I changed what the menu was, then when you click on lectures and get to lecture zero and lecture 1, well what did those files look like? Pretty much identical. In other words, we started with a very clean, a very simple cite, but a huge amount of copy paste. And what's the downside of this approach, of just doing each page from scratch? Let's pick on this design a little bit. Yeah. Conner. I mean if you want to change the whole site, it's going to require you to modify each individual page. Exactly. If you want to modify the site, even some silly aesthetic like the title or the color or something like, you have to go through and edit every file, which, eh, it's not a big deal as of last week. There were four files. But, what it once, what about when lecture 2 comes out and lecture 3 and lecture 4 and 5 and 6 and 7, now we have a whole lot of files that we actually have to edit. What else is bad about this? Yeah. Well it takes a long time to do and it takes up a lot of space. Yeah. It just takes up more space, and even though, you know, discs are cheap these days, this is just wasting space really, for no good reason. It takes more time arguably. You minimally have to copy and paste the previous code, or rewrite it from scratch. And yet the only content that's been changing really has been the menu here, the unordered list, and the title of the page. Everything else in this example thus far has been structurally identical. So we can do better than this, and so we progressively built up from this. So let's do a quick recap here. Let me go into version 1 and recall that version 1 did something a little different and we don't need to look at the actual code if you recall this quick summary in the read me. What did we introduce here in terms of the files that was compelling? We had this, yeah. You stored the header and the footer. Yeah. So we factored out, we factored out the header and the footer. The HTML tag, the title tag, the head tag, and all of that. We factored out the close body tag, close HTML tag so at least then we were only requiring at the top of the file the new content. So let me do that. Let me open up index.html now and let me just do this at a command line just for time's sake, and now this was index.html. No redundancy whatsoever. This is all fresh content and we've additionally required header.php and footer.php. And similarly, do all of the other PHP files include those particular header and footers that I want the user to visit. So this was better. But what was not perfect about this? Version 2 got even better, 3 got even better, 4 even got better. We went through a few of them. So what were some of the faults of this earlier design? You can choose say that something had headers like the title. Exactly. Right now, if I'm requiring header.php; whatever's in there, is what's going to get spit out. So I've presumably hardcoded one title for the whole site. There's not dynamism, at least to the stuff that I factored out. So that's kind of a step backwards. I've paid a price for this dynamism by losing the ability to at least customize the title. So that seems a step backward. But how did we fix it last Monday? How did I give myself back the ability to control the title dynamically? You used a function. Yeah. I used a function, instead. So instead of using require, which is a built-in function in PHP that does exactly that, it includes or requires that file right then and there. I instead wrote my own function called render header and render footer that also took in addition to the name of the file I wanted to render, header or footer, it also let me pass in what data type? Variable. A variable, but what type of variable? An array. An array. Exactly. And it was an array, an associate of array at that-- and let me go in here. If I open up now version 2's example. Whoops. Yeah, this is it-- version 2's example, it takes an array as its argument and specifies that the title key should have a value of this so that now I have the ability to change the values of variables inside of that header because if I look at the render header function-- let me go ahead and do this. Let me go ahead and open up helpers.php, which is the file in which these are defined. Notice that they work fairly simply. They take an argument called data is the parameter name here. It then extracts data and then requires the file. So the only one that should be obvious right now, is this last line, require.php requires header.php. What did the previous line do? What does extract do? What did extract do? Yeah. It actually takes the key value pairs out of the array and makes them into variables that you can use. Exactly. It takes the keys value pairs out of the array that was passed in and just makes them actual local variables. So if you had a key of title, you now get, after calling extract dollar sign title as an actual variable. No why? Who cares? Well if I look at my header.php file, notice that it looks like this up, this is nice and simple with the title. Now if I didn't do extract, that's not a key ingredient here. I can still access things, but instead I would have to do this, data "title" in square brackets to get that key back. So why do you think I called extract in the first place? Why bother calling extract? If you had to guess. I'll take guesses. Okay, if you think you know; oh, yeah. [ Inaudible ] Yeah. Good. Okay. One point. Yeah. Very honestly. It's the simplicity of it. So one, I just didn't want to type dollar sign, data, quote unquote, all over the place, because it's just not necessary, especially if I've created these local variables within the function, it should be sufficient to just type their name. Yeah. [ Inaudible ] In this case, I've just defined one key value pair, so it's a little silly to bother indexing into an array. But even then, if I had more key value pairs, it really is just the simplicity again that appeals. And what was your name, again? Me? Yeah. Isaac. Isaac. That's right. Sorry, I forgot. Yeah. Also, if you ever change the data, you'll have to go through and change all of these arrays. Yeah. So, that's a fair point. If for what, maybe it's more of corner case, but if you ever change the name of that local variable, now you have to change it everywhere, as well. Sure, that's reasonable. So just a little bit cleaner but let's resolve one other open question here, if we go back to the render header function, what is the deal with the equal sign and the empty array function call? What's that doing for me? Yeah. Just creating an array. Just creating an array when? When? Or under what circumstances? Isaac? Taking an argument. It is taking an argument, and that argument's called data, but if I'm passing in an array, why am I saying equal sign array? Yeah. It defines that if data doesn't having anything, it creates an array that's completely empty inputs for data. Exactly, if I call render header without any arguments, which I might if I just don't need to pass in any arguments. And in fact that happens with render footer. I at least want dollar sign data to be of type array so that when I call extract, extract doesn't freak out because I've passed it some null value. Extract expects an array, so this is just a, a one-line clever way of ensuring that I'm passing in an array, because otherwise I would have to do something like, if is array data, then call extract, and you know, it's fine, totally correct and reasonable. But I can do it a little more elegantly by ensuring that data itself has a default value. So, the equal sign just implies a default value there. Alright, any questions on those? No. Alright. So how much further did we go? Well in version three, we did something a little bit different, but now I still have footer.html, I still have header.html, sorry, footer.php, header.php, I still have helpers.php. But there was an opportunity for refinement here. Before in the previous example, I had render header and render footer, and why is that arguably not the best design? Yeah. Well you could just do one function called render and then passing what you want to render. Exactly, right. So this isn't horrible. But the moment I introduce like a render middle, or a render top right function, or more increasingly specific rendering functions feels like there's an opportunity to factor that code out and generalize it as just a generic render function. So recall that in version three, we did exactly that. In helpers.php, I just trimmed my code, added a couple new features here, so that render now takes two arguments, the name of the template so to speak to render, the view so to speak, and then it, it renders that particular template by passing in the data that was given. So a little sanity check here. First, I'm figuring out the path to the file and why am I doing this first line? Dollar sign path equals template.php. Why not just force the caller to specify header.php, footer.php? Yeah. To save space. It's easier to type render header, instead render header.php. Yeah. Exactly. As simple as that. Rather than saying render header.php, it's just reasonable to just want to say render "header." What else? If I ever change the languages that I am using for my templates; for instance, I want to support a templating engine, there exists things called Smarty. There are other templating engines that give you all sorts of functionality in your views. We're not getting their yet, but it'd be nice if I at least can support different types of templates and let the function figure out what they are. So in short, I'm just checking to make sure that the path then exists after appending the dot PHP, then I do the exact same thing as before. So we then clean things up ever so slightly further. That was version three. If we go into four, notice what we introduced this time. Why did I introduce these subdirectories here? I claim that it's an improvement by organizing things in subdirectories. But why is this compelling? Yeah. I don't know exactly if this applies in this example, but you should really keep all your files in other directories that the, that the client can visit directly. Exactly. So this was a step toward a better security model to be honest. This isn't all the way there. This is only a, a half a step toward it. But I've organized now my templates into literally a templates directory. This would often be called the views directory as well. And the motivation there being the user should never really visit footer.php or header.php directly. Why? Well they're going to get a partial page, either the top or the bottom and nothing else. So I just, just don't want them visiting that in principal. And then helpers.php, similarly, I don't want them visiting that file either because it just has helper functions. It's not going to actually spit anything out, and worst case, maybe I have some passwords or something in there, and I don't want the risk of them visiting that file directly, so I'm going to tuck it into a directory called includes. Now this does not necessarily protect us fully. As an aside, recall that Apache and other web servers support configuration files inside individual directories, so I could, if I really want to be clever, have an HT access file in my directory, in my public HTML directory that says do not let the outside world into includes or the templates directory. But the risk there is that if I screw up, if the webmaster disables HT access files, all of the sudden that protection could be lost. So a better approach is what we did half a step later. So that was version four, and in version five, what I instead did was opened, was this layout. I created a whole new level of hierarchy whereby I still had my includes folder, I still had my templates folder, but what's different this time, versus the previous example? Yeah. [ Inaudible ] Exactly. Now I have an HTML directory, the idea of which is that only the web accessible stuff including not just my PHP files, but my gifs, my jpegs, java scripts, CSS, all the stuff that has to be public, goes into that directory. Or not even necessarily public, has to be uniquely addressable via a URL in the browser so that the user can actually visit it. Now recall last week our discussion of http.conf. So recall that that has a, a directive called document root. So let me go ahead and open that at c:httpdhttp.conf and let me scroll down to document root. Recall this thing that we looked at last time where we had this directive here, var, www, html. So this is that document root we keep alluding to. This is why we had to put that silly little index.html in that folder last week. Well okay, how do I now make this particular example accessible on the web. Well I would have to do something now like document roots, home, jharvard, publichtml, html, but this isn't quite right, right? Because if I do this for my document root, specifying that this html directory is the one that should define the base of the web server, you can't go any higher than that. That's what the document root does. It means you can go to that folder and anything in it, but you can't go higher. And that's exactly the protection we want. I'm kind of being an idiot here there, though, right. I haven't really achieved what I want to achieve. Why? Yeah. By default, it's a public html. Right. So by default, public html, is literally public. So you can create as much hierarchy inside of it as you want, the world can still access the contents of that directory. So unfortunately, when you're on a shared web post that gives just a home page with a tilde involved in the URL, like tilde jharvard, you cannot achieve this sort of ideal security model, because all of your files by nature have to be public. So instead, we need to have either a better web post or our own administrative access to the machine so we can configure this a little more properly. So for instance, what I really want to do now is I'm going to propose the following. Let's steal an idea from lecture zero this summer whereby we implement our own virtual hosts. And I'm going to say you know what, rather than use publichtml, let me give jharvard a vhosts directory, for virtual hosts, and in there, I'm going to give him the added ability to do something like this, project zero/html. In other words, I'm going to propose that we configure the server in such a way that John Harvard can create a vhost directory inside of which he can, he can create a project directory, project zero, project one, project two, fubar, bass, doesn't matter what he wants to call the project. All of his code can then go in this directory. And if you want to secure your website in this more optimal fashion, you just make sure that the web accessible stuff goes in the html directory, and your includes directory, and your templates directory go where? In here? Or in here? A or, A or B? B. Okay. B. Right. You want them at the same level as the html directory, not inside of it. Now there's a problem here. If I have to edit my web server's configuration for every project this summer, every, when we get to project one, what's going to break? Project zero. When we get to project two, what's going to break? Project one, right. Because, if we keep changing this. And a web server by nature can only have one document root. However, there is a way we can work around this. Let me undo these changes and let me open up a file whose name is going to change by the time the appliance is posted tomorrow. But I'm going to go in there temporarily to its current name, let's go into httpconf.d and pull up this temporary file here. Whoops. Let's look at this file. So this is another; this is a modified http.conf-- let me zoom back in to make it larger. That does a few things for us. So let's focus on the juicy part first. In line 7, we have the following comment that I wrote, virtual host will override http.conf defaults. So what is this saying? Well, name virtual host, whatever that is, virtual host *80, whatever that is, UseCanonicalName, whatever that is, and then it starts to get interesting. And the most interesting line is 20; 20 is saying that rather than define one document root for the entire server, let's define a virtual document root dynamically by using essentially a variable. So notice what I've done here is I'm saying always look in home jharvard vhosts for your virtual hosts on this server, but this name here can actually vary. It can be project zero, it can be project one, project two, fubar, bass, whatever you want, but thereafter always dive into an html directory for security sake. Yeah. So that means we have [inaudible] the number of instances for the [inaudible]? Can you say that again? So we are creating the instances of [inaudible], I mean, Apache? So this will be, this file will be prefigured for you but what it's doing is giving John Harvard specifically the ability to have an infinite number of virtual hosts on the server. So there is one weakness here in that it is hardcoded for John Harvard, however if we really put some effort into it, we could generalize this further for multiple users. But for now, we just need a single user on this particular VM and the key is that $zero can take, can represent any directory that we create in that vhost directory. Meanwhile, recall that we saw a logs directory in John Harvard's directory. Why is that there? Typically logs are somewhere on the file system, but for convenience, pedagogically, if you want to be able to access your web server log, see where you messed up, see what the error message actually is, we've simply told the server, Apache, put logs instead in jharvard's home directory, in a folder called logs, so that they're accessible to you. And then up here, we see a little familiarity syntax. Eighty is referring to what, probably? Yeah. Port 80. Port 80, which is http, the star represents anything and what it means here is any IP address or any name, so fubar, bass, project zero, project 1, and then named virtual host is just a directive that you need to include when you want to enable virtual hosting based on names in the first place, where the name is in, inferred from that, that host header that you get from the browser. So what does this mean in real terms? Well let me go over here into the appliance and let me pull up John Harvard's home directory, which again has this. And let me go ahead and make a directory called vhosts and let me chmod at 711 so that the world can get in there but can't poke around. Let me then make a directory called project zero, and let me go ahead and chmod that, and then let me go ahead and make a directory, html, and notice I'm doing this very quickly by scrolling up and down. You can go through your history with your up and down arrows. So let me go ahead and make this file, this folder, and then chmod it 711, so now, we have a vhost directory, a project zero directory, and an html directory, and let's just do this here. So, geany index.html, I am in here. So it's not actually html, but that will suffice for a test. Let me do an LS-L. What command should I type to make sure this actually world readable? Chmod 644. Good chmod 644, or I could do the simpler A plus R to give everyone read privileges. LS-L now confirms that this seems to be correct. So now let's go over to my browser and do http://project zero/ enter. Problem. Chrome was trying to being helpful, it thinks project zero.biz exists or .us, but clearly wrong. What's broken here? Got to integrate everything we've talked about thus far, from lecture zero onward. [Inaudible] host file? Yeah. Something about the host file. So even if you're not familiar with that, roll back your mind to lecture zero when we talked about DNS and we told the story of going from my laptop to Google.com and back and the various steps involved. And one of the first steps was, involved in that process was what? Well you can give your, you can give your local host a name, so... Good. And let me answer the first question, so one of the steps involved was DNS because what's happening here is I've presumptuously created a folder, a vhost called project zero, but I haven't told the world about it, so I either need to go buy a domain name named project zero, and that's it, and that's not very realistic. Or I at least need to trick my own computer, the appliance in this case, into thinking that a project zero domain name exists. Now I can do this in a very heavy-handed approach by actually editing a DNS server somewhere on Harvard's network or on my computer. But we can do this more simply in a Linux computer, or MAC OS, or Windows. There's generally a text file you can edit that allows you to do this. So I'm going to type sudo, which is substitute user do, and that means run this command as an administrator by default and I'm going to go ahead and open up etsy for et cetera hosts, enter and that's going to go ahead and open up this file, which looks like this by default. So by default, a computer typically has an IP address that's not public at all. It's called the loop back address, and it is always 127.0.0.1, and that is an IP address that all of your computers have that refers to itself. So it's a loop back in that recursive sense. So if I want to create a project zero host name, I can actually just say, alright, well associate the project zero host name with that same IP address. Create an alias for my current computer that's not just called a local host, that's not called appliance, which we the staff created, but it's also called project zero. These are all just now synonyms. So now let me go to my browser and reload this, and voila, project zero is born on the Internet. Well not really the Internet. Who else in the world can see this? No one at the moment, right? We're not only within the confines of my browser, we're also within the confines of the appliance, and it's the only etsy host file in the world that has this change right now, at least. So what happens is that very first story we told in lecture zero about my computer asks the browser, asks the operating system, asks the world, for the IP address of the host name that I've typed in, now Linux in this case intervenes and says, oh wait a minute, I have that answer for you right here in this text file called etsy hosts. Use 127.0.0.1 and then meanwhile, how does the web server know which vhost to serve index.html for, because suppose I had project 1 and project 2, how does the browser know? But wait, actually, didn't you put inside the project zero, I am in here, not tell the world? Oh I did. Damnit. [Laughter] Very good point. That is because, okay, standby; well played. I had disabled this before class. Notice when you rename a file to .off, that usually breaks it. So let me get rid of that, and now let me do, service, hdpd, restart. You almost let me get away with that. How's that? Alright. So now the story is correct. So, how does not Apache know which vhost directory to go into to get me an index.html file? Where is that information coming from? So it's that same [inaudible] you showed us a few moments before with the virtual host and the [inaudible]. Okay, from, to some project zero. But how does the server know what vhost was requested? Thinking back to lecture zero. The git request. The git request. So because we've only defined one vhost now, in fairness, but in a minute I could create project 1, project 2, I just have to type makedir a bunch of times. So recall that multiple host names can live at the same IP address. And this is true in the web hosting world and also in our little fake world of the appliance where we're pretending to be a web hosting company with one IP address and multiple host names. So how, where is that coming from? Well let me open Chrome's developer tool bar. Let me open up the network tab as we've done before. Let me reload this page, click on here, and actually... Oh, actually as a development tip, notice that I got 304, which does not sound good, I wanted to get 200. That's just because of caching. The webserver said, mmm-mm, you already got this from me, it hasn't changed, keep the copy you have. So I'm going to instead hold shift and reload in general clearing your cache or holding shift is your friend with many browsers. Now I get a 200 and what I care about here is the request headers which has this key line, reminding the server what host was requested. So we've come full circle to lecture zero. We've told the DNS server story then, now we've actually hijacked the DNS story to insert our own answers to DNS queries. So why is this useful? Why is this relevant? Well one, as a developer, it's going to allow you for the course of the summer to create three different projects on the same server in the same master directory without having to create like three different appliances or something crazy like that, or without having to turn off project zero to turn on project 1 and so forth, and plus this is also wonderfully representative of exactly what you would find on a VPS, a virtual private server out there if they do support virtual hosts in this way, and they don't force you to have a URL with a tilde. So even though we started the story with public html and tilde jharvard in the URL, that's a hideous URL. Plus, it does not let us implement this more security conscious directory structure. So as you'll see in the spec for the project zero and 1 and 2, you will be asked to make sure your site works within a vhost like this, which is perfect because this is the way it should be done in the real world anyway. The only customization we've done is just move the vhost directory into John Harvard's home directory just so that it's easily accessible, instead of being hidden somewhere else on the file system. Phew. That was a lot. Any questions? Alright, let's take our MVC one step further before we take a break and then come back on XML. So let me go into the code that we've been looking at, and I think we left off with five here. Yep. Which we had this directory structure. So the website itself aesthetically is not at all different. All we've been doing is hammering on the design. So let me now go into the sixth iteration of this where I do this layout. Pretty much the same, but I've decided, you know what, I'm going to have some best practices here and I'm going to create a views directory which represents all of the aesthetics of my site and I'm going to introduce a slightly different version of index.php that I'm going to start calling the controller. So in this model here, we now have much closer to a true MVC architecture. Still no M, no models just yet. It's all static content, but now I have a controller and then most everything else in the, everything else in the views directory is the V in the MVC. So let's take a look at how this works and why this design is a little bit cleaner still. In my HTML directory, there's nothing now except index.php. However, if I had gifs and jpegs and css, it would also go in this directory. But the key point here is that there is only one entry point now to my entire site via index.php. There is no longer a lecture0.php, lecture1.php in this publicly accessible directory. Well why is that? Well let's go ahead and open up index.php and see what it looks like. Well in index.php, notice that we have a bit of complexity, but not to syntactically different from what we've done before. So first I'm requiring this helpers file. Notice I'm using a relative path. This is a common mistake when people ship software that they've developed on their own machine. Never, ever, ever use hard-coded paths that start with a slash for instance, because it's just going to break when you move it to another server. As best you can, use relative directories so that if you move the whole folder, everything works on another server. What am I doing here? Well it looks like I'm going to infer from an HTTP parameter called page, which page the user wants to see. So I'm going to store that in a variable called page, and if it's not passed in, I'm going to assume the index so to speak. And now I have a switch statement, which is like an if, ltif, ltif, ltif statement, and how does this work? Well, if the page that the user has requested is the index file, go ahead and do the following. First render the header with this title, then render the index with no arguments, then render the footer. And now notice I'm using paths here. I still left off the file extension because it just looks ugly to say .php all of the place. But I did generalize my render function as I had before. Meanwhile, if I instead request the lecture... Actually, let's skip down to lectures and if I instead request the lectures pages, do something almost the same, the template for the header, render that with this title, then render the lectures template, then, rather the lectures view, then render the templates/footer template. So now lecture is a little more interesting. It's always kind of bothered me that I had a lecture0.php and a lecture1.php. Because even if I'm adhering to these better practices of factoring out the header and the footer, I really wasn't practicing what I'm preaching with lecture 1 and lecture 2 and lecture 3 and lecture 4; it feels like I'm going to get a lot of redundancy eventually there. So I'm thinking ahead now and introducing another parameter apparently, called what? Yeah. Called N. N. So just a number, N, presumably, so that now my URL's, they might not look as pretty, but we can actually fix that with modrewrite which we look at last time. But for now, just assume that, there's apparently two parameters, page and N and N is only relevant if page equals equals lecture. But if it is lecture, then I can do this. First render the header with the title of; oh and this interesting, lecture dot N, but we'll come back to this in just a moment. And then, lecture N, so render that template with an input of N and then render the footer. So let's take a look now at these views. Let me go into the terminal window again. There's nothing else in html, so I need to go back with dot dot; let me go into my view directory and let me open up my index. Well this one is incredibly uninteresting, all it does is have my main menu. But notice what's not there anymore. What redundancy have I eliminated? What used to be in this file? Very early on, it had an html tag, body tag, head tag, title tag, all that stuff. We ripped that out. But what was still left at the top and bottom of every file? You still have things that include or run other [inaudible]. Exactly. I had the require statements, or the render statements at the top and the bottom of every one of these files. Now that's gone. So my views are even simpler. So that's the index. Well I had a couple other views. Let me go ahead and open up lectures.php. That's similarly as simple. So that's pretty nice. And now let me go into lecture.php. And now this is kind of interesting because not only is it super short, notice that over here what am I echoing out, N dynamically. And so long as I've standardized the name of my slides, the name of my movie file, video file, and so forth, I can generate even that dynamically. But there's a flaw. I screwed somewhere. What did I forget to do somewhere in this pipeline? There's an excess S attack here somewhere. Which again, we'll come back to that at the end of the semester, but we did talk about briefly last time. And the time before. Yeah. If someone, you didn't make the [inaudible] are safe [inaudible]. Exactly. Where did N come from? N came from the git string and I did not call html special [inaudible] or anything like it either in the previous file or in this template so in short I'm vulnerable to that same kind of stripped attack where someone might paste in some java script code into my sight without me realizing it which can steal my cookies or other such things. So I should minimally be calling html special [inaudible] here, or before even passing it in. But frankly my templates are going to start to look atrocious if everywhere I have html special [inaudible], html special [inaudible], html special [inaudible], so another design opportunity here feels like maybe in my render function I should first be iterating over the array, escaping everything then and then passing it into the template. So again another design opportunity. So bearing in mind for project zero, the axes along which we evaluate assignments, not just scope and correctness in style, design is the more subjective of the four and design speaks to issues like this. If you're copying and pasting the same kind of code again and again, you can take even these lecture examples to the next step, the next level and refine them even further. Alright. Let's take a look I think at one last one here. Let me go into seven and introduce one other trick. Or actually, let me open up version six to show you how ugly a approach it actually was, MVC. So let me close Chrome's inspector. Let me go into version six, html. Oh and just to be clear, even though I've chmod'd John Harvard's directory 755 for the sake of lecture so we can click around, realize that if we actually take the vhost approach, it's the html directory that could be made to be the virtual host's document root. So again, we're just navigating now for convenience in public html. So now notice, when I click on version six's link, notice what the URL looks like. Page equals lectures. That's not all that pretty, but in worse, when I click on lecture zero, then it really gets ugly. Alright. It's not wrong, and in fact, this is like true use of HDP, but frankly this is web 2.0 that we're in now, right. And most URL's are much sexier than this. It's just slashes and words for instance. So frankly a sexier URL than question mark, page equals lecture, ampersand N equals zero, would be something like lecture zero. So, something like just feels cleaner. It has not fundamental difference, but it just is the way things are going and various web frameworks make this easy, easier. So, it turns out you can indeed do this with this feature we looked at last week called modrewrite. So now here, let me go ahead and open up version seven, so let me go back to version seven here, html, and let me click on lectures. And now this is broken because I need to add one other thing in version seven, we need this here. So, we can actually fake those prettier URL's and still use the exact same code by using this tool called modrewrite. When did we use modrewrite before, or why? Yeah. [ Inaudible ] As a redirect. So that was one such trick. And we wanted to redirect from like hddp to hddps, or we wanted to standardize on www.something.com instead of just something.com, and so forth. So anytime you want to manipulate URL's, modrewrite's a powerful tool. So here I have an HD access file that first turns on that so-called rewrite engine. I now need a rewrite base just for the sake of these lecture examples because I'm not in slash, because I'm instead in this silly MVC/7/html directory, I need to trick the server into thinking that where I am is the route directory of the server. And that's just again for lecture purposes here. But the last two lines are generalizable. I first have a rewrite rule that says if the user has requested affectively /lectures, and literally just lectures, the carrot means start from the beginning of the string, the dollar sign means match to the end of the string. So if literally the URL is something, something, something/lectures and that's it, reroute the user's request to index.php, question mark, page equals lectures, and that's it. So in other words, even though the user came to you via /lectures, actually respond to them as though they went to this partial URL. So this does not redirect the user. Notice there's no R there. There's no 301. There's no 302. This is instead doing a behind the scenes internal redirect because there's no http there either. Because it's a local URL, the web server does not need to redirect the user with a location header, it can do it all inside the web server itself. Meanwhile, if instead the user visited /lecture/whatever this is. Now this is just a placeholder. Dot means anything, star means zero or more anythings, could then go ahead and redirect the user to index., sorry, then route the user, don't redirect them, to index.php, question mark page equals lecture and N equals dollar sign 1, and what does dollar sign 1 refer to? Yeah. Whatever the [inaudible] you typed in after the [inaudible]. Exactly. Whatever the user typed in after the lecture/. Hopefully, it's numeric. Hopefully it's zero or one or so forth. There's nothing stopping the user typing in fu or bar, so lecture/fu, lecture/bar, in which case, they will be passed this page where N equals fu and n equals bar. But the once then should really be on index.php to just ignore invalid things. We could do some error checking there to make sure that it's only actually a number so that the site doesn't err. Or worst case, we through a 404 and just say file not found, which would be reasonable as well. So in short, modrewrite is incredibly powerful and allows you to do these kinds of tricks, and this is very common with things like Media Wiki, with WordPress, Drupel, all of these various sites that you've download and can run on your own webserver to get the sort of pretty URL's that the world is now familiar with, it generally reduces to techniques like this. Because otherwise /lectures/1 would generally refer to like a directory or something like that when it's clearly not the case here. Alright. Any questions? Alright. That was a lot. Why don't we go ahead and take a five minute break. When we come back, we'll dive into XML. Alright. We are back. So you've met Elan and Peter. But I just wanted to introduce a third member of the team who's with us today. High. I'm Chris Gerber. I am of the [inaudible] teaching fellows. I'm an IT manager by day. I work on my masters at the Extension School at night, and I'm looking forward to working with everyone. Excellent. Welcome aboard. So Chris will be here tonight for office hours which will be our first opportunity for one-on-one Q and A about material we've covered thus far, about questions you might have on the spec. If you want to take some time either during or after class to read through the PDF that's already on line. But toward the end of lecture today, we'll also walk in part through the specs so that you have a sense of what awaits and what kinds of questions you might have. So definitely take advantage of these opportunities in person for help both in section and in office hours, and for those of you who are distant, as you saw from Wellie's note, one of our own distant TF's will also be making available some opportunities online. And even if you're local, you're welcome to tune to those online sessions as well. So that was a whole bunch on MVC, PHP, Apache. Realize that some of the more mundane minutia of like typing the commands and what not will be recapped generally in the spec as needed. But any questions? Conceptually, mechanically, or otherwise. Otherwise, I'll assume you're quite comfortable now with git and post, and the session object, and, and MVC. Yeah. Okay. So let's forge ahead then into XML, which actually has nothing fundamentally to do with PHP, but for which PHP has wonderful built-in support that just makes working with XML files easy. We will see toward the end of the semester when we look at AJAX and JAVA Script, we'll come full circle to some of the same ideas we're about to discuss here. So you'll find that XML is a topic that comes up in all sorts of domains. So here is a simple example of an XML document. It kind of looks like html except again, it's sort of make your own tags. And in this case, we've made tags like order, sold to, person, last name, first name. Everything here happens to be in upper case; however, lowercase is perfectly fine. HTML typically should be all lowercase these days, but for XML, you simply want to be self-consistent. So this is apparently some kind of representation of what? This is a computer's way of representing what, apparently? Your purchase of the "Harry Potter and the Order of the Phoenix" book. Exactly. Your purchase of "Harry Potter," which is literally the day I bought it from Amazon years ago. So, what's your name again? I'm Jack. Jack. That's right. Okay. So as Jack pointed out, this represents a book that was sold. So why might this be relevant in the real world? Well, it's not so much a data format that someone like Amazon themselves would use for storing data in their own databases. They would need a much faster performance that you would get from something like MySQUL, or Oracle, or a real database engine, which we'll start looking at in a couple of days. But for XML, you might want to; with XML, you might very well use it to exchange data between two parties. For instance, you might use it if Amazon has partner-shipped with third parties as they do, to exchange data in a non-proprietary, non-binary format so that if that third party has a database of its own, but it's not necessarily identical or even the same make or model as Amazon's, they can talk in a database neutral way. And so XML is actually very common in the world of RPC, remote procedure calls, a topic generally known as web services, or really web-based API's. And also common these days is JSON, java script object notation, which we'll look at in a week or two's time. So for now, we have some XML. Notice a few key features of it. One, it's hierarchical, at least aesthetically. Just with html, you don't need to hit the space bar or carriage return or any of like, anything like that. But if you do, you can see that there is indeed a hierarchical nature to this document. You can see too that we have some basic building blocks. Henceforth, something like open bracket, order, close bracket is what we'll call an element. It's not just a tag. An element includes everything inside of that start tag and end tag. So the element here is order and it has some children so to speak. So sold to is a child element of order. What's another child of order in this example? Yeah. Jack. Sold on. Sold on. And you can infer it from the indentation and also from the fact that it's color-coded here. But again, if you actually piece through the open tags and closed tags, it is indeed at the same level as sold to, as is item. So apparently, element, the element called order has at least three children that we can see here. Now technically, just as an aside, technically order might have more children, even that are pictured here. Technically, there is some white space right after order there, there's white space to the left of sold to, so arguably, that's another child. And we'll come back to this in a bit. In XML, when you represent an XML document in memory, with an in-memory data structure, like a tree, it is indeed the case that white space might matter even though fundamentally it shouldn't matter for the data transfer here. So more on that in just a bit. So, "sold to" has a child called "person." Person has two children called "last name" and "first name." But person has something else, just like html, an attribute with a value in this case, which should be quoted just as I've done here, single or double quotes are fine, so long as you are consistent. And then elements can have zero or more attributes, but those attributes must be uniquely named. You can't have two ID attributes. So very quickly we're going to run into a question here of attributes versus children. When or why should you make a piece of data an attribute do you think versus a child element? And again, we've only done a quick definition of XML here, but you've been using attributes and elements in HTML for some time. Yeah. When they're related or unrelated to the other elements. Okay. So when they're related or unrelated to the other elements. So in this case, ID is an attribute because it doesn't really have something to do with first name and last name. And that's reasonable. It's sort of conceptually distinct. So that's reasonable. What else might motivate making ID an attribute as we have here as opposed to a child element? And by, to be clear, I mean what if instead I had not written ID equals quote unquote one, two, three, instead after the close brace, I'd hit enter, and then done open bracket, ID, close bracket, one, two, three, open bracket, slash ID, close bracket. So all I mean by a child owned. Sorry. I should have been more specific. When it's sort of immutable data, it goes along with that child. Aw. Good. So a little different there; so when the data is immutable, and really a leaf of sorts, where it itself cannot have any notion conceptually of children. So when you are an identifier like a number, a one, two, three, we're never going to be able to tease that apart in the same way that you could tease out a person into a first name and a last name, maybe even a middle name. So there's some notion of hierarchy there potentially. But in ID, that's it. When are you ever going to tease an ID apart into something more if it's just a number to begin with? So in that sense, very reasonable to make ID an attribute because you don't need to extend its definition later, and indeed that's where the X in XML comes from. It's extensible markup language and that really speaks to the ability to [inaudible] and explode it into more semantic detail by adding more and more tags just so that you can markup more pieces of data. In fact, why did I use last name and first name instead of just name, which seems also pretty reasonable? Why have last name and first name here? Yeah. In case you want to, I don't know like use the more specific information when you're like sorting by last name. Okay. Good. Maybe we want to use that more specific information like sorting by last name, sorting by first name, or even simple customizations. When you get those form letters from companies, Dear David Mayland, feels very cold and I realize that it's still a machine doing it when it says just Dear David, but at least it's a little more human like, and it's harder to do that if you just had a name attribute because there are certainly people out there who don't just have first names and last names, there's middle names, there's two middle names, three middle names. It's not clear necessarily where the break is between someone's given name and their family name for instance. So doing it in this way helps with that. Alright. So what else is interesting about this? We have tags as are, in present in html, end tags and the like. We have some text. So hopefully syntactically this is all pretty familiar. And it's nice. It's a little freeing in that we can now use our own tags. But here's what I mean about the X, extensibility. Suppose in the future we decided, you know what, we want to go ahead and add a bit more detail to a person. So notice I still have person, and last name and first name. But I've decided to introduce the initial, J in my case for this person. And I also want to insert their address. What's compelling here is that even though the XML document has grown in size, I have not broken the structure of anything that was there before. So XML is compelling in that if this is something being used by Amazon and some random third-party partner of theirs, in theory, if you've written your software well, whereby you expect that XML documents might have more elements than you care about, but not necessarily in such a way that they'll break the document. Hopefully, if Amazon tomorrow decides you know what we're going to start giving our third parties middle initials as well as people's addresses, which frankly they should have done the first time, that hopefully won't break any code that other people have written to ingest this XML. It, worst case, it will be ignored, the initial, and the address, but at least it won't break any existing workflow and that's one of the salient characteristics of XML. Well, let's take another example here. So how about a student's database of sorts? Again, not good for long term storage of lots and lots and students, but for something short like a pizzeria menu, something like XML might suffice, or even a small database of students. So what characteristics do we have here? We have even more features in this document, which is what it's meant to convey. So at the top here is the so-called XML declaration. Very tragically, this looks a little bit like... PHP. PHP because of the dang open bracket question mark. So whereas you can include HTML in PHP files raw and have them spit out, you cannot include an XML declaration in PHP files because it will be confused by the PHP interpreter as being a PHP tag. So instead you have to resort to a hack like using echo or print for this one feature of XML. But just realize there's that corner case. Technically though, the SML declaration is not required. It's simply optional, but it says things like what encoding have you used in terms of character set? And what version of XML are you using? In this case, one. So, now we have this thing. This is obviously a comment. This is part of the XML spec. You can have comments in it. And in fact when you build up a tree in memory, you wouldn't do this yourself, a browser would do this or a parser would do this on the server. When you build up the tree that represents this hierarchical document, you will get nodes in that tree represent comments. So they're not, they're ignored typically in terms of rendering. You don't see HTML comments in a page, but they are there and they are stored in RAM as a node in a tree. So this thing is the so-called root element of the document. An XML document like an HTML document can have one and only one root element. In the world of HTML, what is the root element called? HTML, right. In the world of XML it can be called anything you want so long as there's one and only one element at that level. If you want to put something else at that level, it frankly has to be in another XML file. That's simply the way the spec works. So student is a child of students. It has an ID in this case of 001. Their name here is Jim Bob. Status, a graduate. Dorm, this looks a little weird. What does this resemble? And what does this mean do you think? Jack. [ Inaudible ] Okay. Good. So it's a tag that's opened and closed all at once and that typically implies that it's empty. The element can be there but there's no data for it, so it's an empty element and here it makes sense that this student might not have a dorm because she's a graduate student, so there's just no address. So maybe this could be removed altogether, but the fact that it's there is not necessarily a bad thing. In fact, the receiving party might prefer that it at least be there to make ever so clear that there is no dorm, not that we forgot to include the dorm. This might resemble an HR tag, BR tag, HTML has empty elements that even though in HTML you don't have to close them with a slash, if you're familiar with XHTML, which is an XML compliant version of HTML where all tags must be open and closed, attribute values must be quoted, and all tags must be closed even if they're empty like this. Might have seen this syntax before. Okay. Now we have a major. And here's a curious thing. Why is my major, is Jim Bob's major computer science ampersand amp semicolon music? And what does this remind you of? Yeah. Isaac. Double major. Okay. It does imply a double major. But what's with this cryptic thing that's never going to show up on our Harvard diploma, ampersand amp semicolon? What is that all about? Yeah. [ Inaudible ] The next... Not next line, actually. Not a bad guess, though. [ Inaudible ] Exactly. So this is an HTML, this is an XML entity which is present here because ampersand is a special character in XML. It demarks the start of an entity and you've probably seen an HTML [inaudible] at least one entity. Ampersand NBSP semicolon, non-breaking space, is one special one. In XML, actually there's other in HML, ampersand GT for greater than semicolon, ampersand LT less than followed by a semicolon. There's some special characters in HTML and XML called entities that always start with semi, always start with ampersands and always end with semicolons and the word between is some identifier. AMP here means ampersand. So, even though it's a little coincidental that there's an ampersand followed by AMP, this means that this is an HTML entity representing a single ampersand which represents the word and. Similarly, if we wanted to put a less than character as part of my major for whatever weird crazy reason, I would need to do ampersand LT semicolon. So, in short, in XML documents, if you have raw ampersands in your document, you must escape them in this way here. Okay. What else is interesting here? See data. We'll come back to that. We've got some H1 in here; we've got another student idea. So let's tease these apart a little more specifically. So I'll fly through some of these things just because they're kind of mind-numbing details, but more formal definitions of these various pieces of XML files. So the XML declaration reminded at top right there represents again an optional piece of data that essentially informs whatever program's reading the XML document what version you're using, what encoding, and the like. So elements, we talked about in the context of the example. But there are a few rules. So one, they always need a start tag and an end tag. If the start tag has attributes, the end tag does not have attributes. Just like the HTML. There are some constraints on names which you should bear in mind for pizza ML, for project zero. They have to start with a letter or an underscore. They cannot start with numbers which is one of the minor annoyances with XML, but they can have a few other characters in them as well. And again so that you don't have to scribble everything down, these slides are on the course's website as always. So content model. So this is something worth keeping in mind because it again allows you to sort of model different design decisions. If you have an element, an element can have different stuff inside of it, that is nested inside of it. In the first case, you might have element content, whereby you have an element like student and then inside of which is another element, in this case called status, and then that's it. So this is so-called element content. You might have PC data, parsed character data, so Jim Bob that was an example from before too, the child of name in this case is a, is PC data, parsed character data. What does that mean? Well, we'll come back to that in a moment. But for now, just assume it means text. How about mixed content? This is kind of a weird example that you should not do in practice, but it does convey the idea. You have a name element, and then you have Jim Bob, but Jim Bob this time has an initial and we wanted to semantically tag his middle name for some reason, but not his first or last name. So again, bad design, but it does hint at how you can technically comingle PC data with elements. So this is a mixed content model because you don't just have text, you don't just have an element, you have both intermingled. This is more real if you think about HTML now. In HTML it's very common to have a paragraph tag and then you start writing your paragraph. But then in the middle of your paragraph, you enter the boldface tag or the italics tag and then you close that tag, that too would be a mixed content model. You have text and tags comingled. Then no content, like dorm, like Jack pointed out. So we have those several content models. Those are our options for modeling. Attributes again. Pretty straight forward. Very reminiscent of HTML. Must start with letter or underscore and only contain letters and so forth. They must be quoted here in this case. HTML is not as, not as rigorous about this. HTML Five at least, but XML still expects you to quote things either with single quotes or double quotes and then again this last bullet is the most scary, you cannot have ampersands or less than characters. You must escape those with entities. Alright PC Data just represents text. So let's fly by this. And now entities. So now we see a more formal definition here. So there again, used to escape characters that might otherwise cause breakages in XML files. In XML you get five entities for free, ampersand, less than, greater than, apostrophe, and double quotes, and that's it, only those five exist. In fact, MBSP is not on the list. So if you want something like that you have to define it yourself and you will not likely need to do this for this project unless you really want to get fancy. This is the syntax in an XML file for how you can declare your own entities. So in the world of HTML, someone somewhere wrote out this entity for non-breaking space and what does 160 likely refer to? Where does that come from? So it that's a decimal number, 160, and that is simply the numeric code that represents the character we know as a non-breaking space. It's not something you can type on the keyboard, but rather that's its numeric code, so it's uniquely identifiable. Alright, any questions? And actually you've seen HTML entities before, right. If you've ever looked up something like a copyright symbol or a weird triangle or circle or some symbol to use in a web page, you've probably used one of these entities, albeit in HTML. So character data's a little different. Recall we had this snippet at the top right here for C data. This will rarely, this will not always be necessary. But sometimes is. If you want to have raw HTML or JAVA script or any piece of data that might have scary characters like open brackets and ampersands that it just doesn't make sense to escape. For instance, suppose you wanted for whatever reason, to store JAVA script inside of an XML file. It's not unreasonable to store a piece of data in another piece of data. But JAVA script has arithmetic functions and things like less thans and greater thans. Right. It would be kind of ridiculous if you had to rewrite your JAVA script code to ask, to change things like if X is less than Y, then do this. If you had to rewrite that as if X is ampersand LT semicolon Y, right. That's not JAVA script. That's a weird amalgam of XML and JAVA script. So C data exists for instance for that purpose where you can tell an XML parser, a program that reads an XML file, much like you yourselves will write. You can say ignore the following stuff, do not parse it. It's C data, character data, not parsed character data, PC data. And you can just say to the parser, just suck in everything you're about to see without worrying about its syntactic validity, I'll worry about that. So in this case, I'm using it, not for JAVA script. But if you want to store HTML in an XML file, you don't want your HTML tags to be conflated as XML tags, which they could be. Because again, XML you make your own tags. Just because they're HTML doesn't mean you can't use the same names for your tags. So if you wanted to include some HTML in an XML document, you have to unfortunately wrap the characters with open bracket bang open bracket C data, open bracket, and then at the end, close bracket, close bracket, close angled bracket. Now why in the world did they choose those sequence of characters? Probably because no one in their right mind would ever need to or want to type something like that out in the real world, right. It's just frankly, it took me years to just memorize it, frankly. But now I got it. And it's only something you need again in these kinds of cases. So is it relevant to project zero? Maybe. If you decide for whatever reason that you want to embed some HTML markup for whatever reason, it could be relevant. But odds are this is not a feature you will need to have since again you're writing your XML files from scratch and have full discretion over what to include or not. And comments are as they are in the world of XML. Alright. Any questions on the definitions of XML. There's a little more sophistication to it than make up your own tags. But at the end of the day, it's very reminiscent of the HTML world with which you're familiar. Alright. So let's now actually use this. I'm going to go ahead and introduce by way of examples an API that comes with PHP called Simple XML. It's pretty good. This API is just a suite of functions that comes with PHP5 that make it easier to parse XML's, so that you do not need to write software that reads in open bracket word space word equal, you don't have to, it parse an individual XML file. You can just say, here's a file of XML or here's a big string of XML, give me a tree that represents this XML. So there is a function in PHP that comes with this API that will literally hand you a pointer or a reference to the root node of a hierarchical tree that has been constructed in memory. What do we mean by tree? Well we mean this thing called the dom. So here is a much simpler example at top left of some XML. Let me zoom in. And notice that it's simple because there's only a few elements here. So just try to wrap your mind around this. We have students root element. A student child with an attribute of one and then two children beneath called name and status. So what do I mean when I say there's a function that will parse this string of XML and hand you back a reference to a node in a tree. What it means is that PHP and this function I'm referring to is going to build up a picture like this here. So at the very top, we have first what we'll call the document node. This is not the root element. So ironically, XML has root elements, but when you build a tree the root element is not the root of the tree. Now why is that? Well it's motivated by the fact that even in my simple example, what did I have above the root element? Comment. Yeah. A comment. And that's valid. You can do that. But if you can have comments at the top of your file before the root element, and you want to build a tree out of this thing, you now need a new root for the data structure in memory so that you can hang on to that comment and on to the root node, and so that special root is called the document element, depicted here as a rectangle. So the two children of document here, as implied by the downward pointing arrows is a comment on the left and then an element node on the right, the element for the students tag that we saw in the XML fragment. And the order matters, at least when you draw it, left means it's the first child, right means it's second child. Now what are the children of students? I claim that it has three children. And just as a sanity check, what are the three children of student? Let me put the XML on the screen as well. What are the three children of student? And here is where I said we'd come back to this sort of weirder interpretation of children. Yeah, Isaac. Name and status. Name and status of children of student. But how about the children of students? Student name and [inaudible]. Well, one for three. So it's not name and status, because name and status are again children of student, singular. So there's clearly at least one child or students, right. Everyone would probably agree that student is a child of students. But I claim that in some interpretation there's actually three children of students. What are the other two, Jack? The space before and after the students. Yeah. Exactly. So, you kind of have to, we've kind of learned as a species to ignore white space in this case when programming. But notice that this is really a backslash N. This is probably a space bar, space bar, or back slash T. So there's some chunk of white space there that the user has hit on his or her keyboard. So if we kind of clump that all together as one chunk of text, that's arguably a node in the tree. And it's what we'll call a text node. And indeed that's what we've drawn here. The text node here is backlash N backslash T. Now technically, that could be two nodes, two text nodes, one with backslash N, one with backslash T. That would also be legitimate. However, almost all, almost all XML browser should join together adjacent text nodes into just one. So it's safe to assume you'll just get this one as I've drawn here. Now we have the student element and then over to the right, what should be this element here that's slightly off screen? What's the contents on this node? So it means what comes after student? What's that character I'm pointing at with the green? Yeah. Jack. It's a new line. Just a new line. And indeed if we scroll over to the right, that's what I claim is in that node. Alright. You don't see because it's white space, but it is in fact there if you hit it on the keyboard. Now what's the deal with ID? Now, this is just an artist rendition. But I decided to draw the attribute as sort of a horizontal thing hanging off of the students element. Why is that? Well it's not correct to draw an attribute as an child because it itself cannot have children, so I arbitrarily, and just because it fit nicely on the screen, drew the attribute to the right hand side there instead of down. So that's just fundamentally different, but again this is just again an interpretation of a dom. It doesn't mean this things literally hangs off to the right in memory. Alright. So what are the children of students? This one's got a lot. It's got some white space, name, white space, status, white space. And at the end of the day, we're going to throw away all of this white space. But realize that truly underneath the hood there are these nodes having been built up in RAM when reading this XML file. Lastly, it turns out that name itself has a child and this is one of the real motivations for talking about text nodes. Finally, we have a text node with actual interesting content, jim bob, for Jim Bob, that is a child of name and status as a child of graduate, which is the status of Jim Bob in school. Okay. So in short. You have an XML file, either in a file or as in a string variable in PHP. You call a function in the simple XML API that's going to parse that string, or that file. What are you going to get back as the return value? You're going to get back the address or reference or pointer-- think of it however you'd like for now-- to that document node that you can then traverse, recursively, hierarchically, however you want to navigate this document and pluck out the data. So what's the relevance to project zero? You're going to have to come up with a model for your pizza menu where maybe you have a menu element and maybe you've got a pizza's element and a sandwiches element or something like that, you're going to need a way programmatically to travel that tree and figure out what types of pizzas are there, what kind of subs are there, what kinds of salads are there, and the like. So this is where we're going with this, this conversation. So besides dom, and let, oh, I didn't even use the buzz word. DOM, document object model. This was DOM. So document object model refers to the in-memory representation of an XML document or an HTML document in this sort of tree fashion. And again, this is something we'll come back to in the context of JAVA script since it's even more common these days to traverse DOM structures [inaudible] client side using JAVA script. So let's take an example now. RSS. How many of you actually read RSS feeds? Okay. So a few of you. Those are XML files. The root element of an RSS feed is open bracket RSS close bracket. And it's just XML, the rest of it. Now there's a whole bunch of child elements and descendants and some weird attributes in there so that you can store links and pictures and all of that these days. But at the end of the day, RSS is just an XML file and it actually, the specification for this language lives coincidentally at the law school, that URL, if you'd like to read up on it. But here's a representative snippet. This is an, this is an RSS feed with no actual content, but it shows you the minimum required elements that you must have in an RSS feed. You must have the RSS element at top. You must have a channel element as a child, a channel must have a title description in link, and then it must have one or more item elements, or zero or more item elements where an item represents a new story, something that's in the RSS feed. And for those unfamiliar, RSS and RSS readers are about syndicating news in a standard machine readable format so that you can use Google reader or Safari or Chrome or IE to read the day's news without going to ten different places websites to read them there in their usual way. So an item has a few elements as children. A GUID which is unique identifier which is supposed to uniquely identify that article or that item, a title, a link, a description, category, and a publication date. So this is an RSS feed and let's go ahead now and do something interesting with this. Let me go into the appliance where I have an example here in the XML directory and let's open up lectures.xml. So in advance of tonight, I decided to make my own XML file that looks like this, a lectures root element, then lecture children, each lecture has an attribute called number with a value, and then below that is a title, and a date's element and then there's a resources element, and then there's resource, resource, resource, depending on how many resources we gave out that night. And each resource can have a format. So in other words this is just something I came up with arbitrarily but I gave some thought to its design so that I'd have minimal redundancy and also the ability to distribute certain handouts in multiple formats, pdf, or zip, or anything else like that. So for instance, in the very first lecture, we gave out some slides, and here is the URL to those slides. It's a long URL, but it ends in .pdf. We also gave out the syllabus that day. And again, realize that I have put a format element as a child of a resource simply because if we wanted to release like a word document for the syllabus, we could, just by adding another format without calling it a separate resource, same resource, different format. So again, this is representative of a typical XML design decision, so as to accommodate that kind of of versatility. And if we scroll down, we'll see indeed here's a good one. Once we get to source code, if you've downloaded stuff from the course's website yet, recall that source code we distribute as an index, which means you can browse it in your browser, PDF and a zip. So here's an actual example of distributing multiple formats for the same conceptual resource. In this case, source code. Short, could have done this any number of other ways. My names for tags are completely arbitrary. This is not some lecture standard that the world has agreed on. There's just my little ole me's format for XML, similarly will you need to come up with your own menu.xml file for this pizzeria. But what I can do now that I have this in XML is something like this. In my XML directory, I have the lectures.php file which looks like this. For the most part, this is just an HTML 5 page with a hard-coded title, and then an unordered list and here is that function I promised existed earlier. And there's a few variants of this. This is simple XML load file, there's also simple XML load string, and there's a few others. There's also an object oriented version of this, but I'm using the procedural function. And notice what I'm doing. I'm calling simple XML load file quote unquote lectures.xml. And that, frankly it couldn't get more explicit, that is loading that file. What is returning? It's returning to me a reference to a tree. That hierarchical picture we just looked at and just to remind myself of that fact, I'm calling my variable dollar sign dom. So now, with dom represents that the root of that tree and because this trees arrows point downward from that root element, I can access any other node in the tree just stepping downwards, down, down, down. So how do I step downward? Well what's nice about the simple XML API is that frankly they make it really simple. I mean that's literally where it gets its name. So here's a for each statement. For each dom arrow X path quote unquote lectures/lecture. So X path is XML path language. It's a feature of the XML world that allows you to traverse hierarchical data structures, namely XML files, using paths that look like file system paths, C colon, backslash, program files, backslash, word, whatever, so using paths like that to navigate a hierarchical document. So in this case, the X path function always returns an array of nodes that match your X path expression. So what am I asking for? I'm saying give me all of the lecture elements that are children of what? Lectures. The lectures element which is a child of nothing. The leading slash means that's the root element. So if I get back one or more lectures, this for each loop like we saw last week is going to iterate over that array and on each iteration it's going to update the value of dollar sign lecture to be that lecture and so on each iteration of this loop we're going to print an LI and a close LI inside of which is going to be lecture title. So this too refers to the simple XML API. If you just want to start at a node and go one level deeper without doing like a search. X path is allowing me to search the whole tree. But if I'm at a lecture and I want to get a lecture's title, recall that title is its child in the document we just showed. This will grab its title from the node. So if this is a little unclear in code, let's just look at this in a browser then we can walk through it. Here we go. That's the XML file. Let me open up the PHP file, voila. My XML, my PHP file has this dynamically generated list of lecture titles which looks really simple and underneath the hood notice is only HTML and as I mentioned, disclaimed last week, it's not beautifully pretty printed because my PHP code is, my HTML doesn't have to be, but all of this came from where? From that XML file. So let's go back to the XML file, lectures.xml. How did I print the title? Well notice that I used X path to get all of the lecture elements in an array by saying X path slash lecture slash lecture and that gives me an array of size what, apparently? If you recall from the previous unordered list. How many bullets were there a moment ago? Yeah. Four. There were four. Why is that? Well if we scroll through this, we'll be, we're in tonight, we're at lecture three, zero index, so tonight's the fourth lecture, so we have lecture, lecture, lecture, lecture, that's why we got back an array of size four and on each iteration of that four loop, what was I printing? I was printing each lecture's title. But what is title? Title is a child of what kind of element? Lecture. So I simply have to say lecture arrow title and that gives me the title. So this is again, where simple XML gets its name. It's frankly pretty simple to navigate an XML document using these arrows and these X path expressions because you can pretty much address any part of the document that you want and really easily loop over it which will again be useful for dynamically generating a pizza menu on a screen out of one single XML file. Now what else can we do here? Number. What if I wanted to display the lecture number for each thing? So suppose over here; let me go back to my web page. This is not all that useful because I kind of have forgotten in my unordered list what the lecture number is. So let's go ahead and do that. Let me go in and first remind that number is an attribute and title is a child. So our syntax is going to have to be slightly different. So I'm going to go ahead and do this. Inside of my LI, I'm going to go ahead and say lecture and then I want to go ahead and print out the number. So how do I do this? Lecture open bracket quote unquote N, or what was it number or N? Number quote unquote and then I'll do like a colon here just for formatting purposes. So now notice then I'm going to print the title. So let's see what the end result is. Let me go back to my browser, reload the page, and now we have that. So we have really just two pieces of syntax. One is attributes, which apparently involves square bracket notation just as though it were an associative array, but it's not really. It's an XML attribute. And the arrow notation allows me to traverse things hierarchically. So it turns X path, though is more powerful. Suppose that for whatever reason I only wanted to select for this page, lecture number 3. I need to apply some kind of filter. Now, I could do it the old school approach of well just do your X path expression, get back an array with all the lectures, which in this case is four, and then just have an if condition. And I could do something correct but a little tedious, like if lecture bracket N equals equals three then I can go ahead and do something. So I could do that. But it's a little inefficient and I'm asking for all of the lecture elements even though I only care about one. So it turns out X path can do this for me. I can introduce the notion of a predicate with a square bracket inside of my X path expression and then I can actually add a qualifier like number equals quote unquote three close bracket. And now, if I save the file and go back to my page and hit reload, I have screwed up because lecture three, lecture number three, because I forgot the attribute. Okay. Sorry, you don't just say number. At for attribute. At number equals three. Now let's go back and reload. There it is. So now, I've only selected one node, not four, with that X path expression. So let's try to generalize this. Turns out in X path you can express yourself in this way. This is sort of the canonical if overwhelmingly cryptic definition of what's called a location path. So a location path is that whole expression in quotes. So each of these things between the slashes is what's called a step, step, step, step. A step would be part of the location path here. In this case, there's two steps. This one because there is a slash there and then this whole thing. So axis, let's come back to, node test let's come back to, but here's predicate. We just introduced the idea of a predicate. In this case, I'm filtering on at sign number equals quote unquote zero, which is hopefully going to give me lecture zero, so now the whole thing is a location path. Now what's an axis. Now thus far, we've been using the default axis which is child. So if you don't specify the double colons with the word child in front of it, X path just assumes you want child. But it turns out there's other axes. There's parent colon, colon. There's sibling colon, colon. There's preceding colon, colon. There's following colon, colon. There's descendant colon, colon. In other words, you can start at a given node and identify a whole bunch of nodes above it to the left of it to the right of it next to it below it, thus far we've just been doing one level below the so-called child axis just because it's the most common. In fact the at sign here is actually shorthand notation for another axis and that shorthand notation, when exploded, is actually as verbose as this. So this is identical. Attribute colon, colon, number equals quote unquote three. It's just, my God, no one wants to ever type that, the at sign is a nice shorthand notation for it. However, if I really want to be anal, I can do this really properly, child colon, colon, child colon, colon, and that now is the same thing. So how does this really work fundamentally? Well, when you start an X path expression, and you use it in something like PHP, the first thing says, the first thing says, the slash, give me all of the nodes that start at slash. Fortunately, there's generally only one, there is only one, the root element. So hopefully child colon, colon lectures matches the root element. Next, the slash means, okay start from that result set, that node set containing all of the results of that expression slash lectures, which is one node, and now grow or filter the resulting node set to include also anything that's a child called lecture of the lectures element. It's a node test in the sense that lecture is the name of the node that you want to match. You could change it to a star and that would give you any children of lectures. I only want the ones called lecture, even though frankly there are no other children because I wrote the file, but that's really just a design decision I made. So at this point in the story, I have all lecture elements in my node set that are children of a lectures element that's a child of the document itself. Lastly, you have this predicate. The predicate says, okay, you've got all these lecture nodes, now go ahead and include only the ones that match this predicate whereby their attribute called number equals quote unquote zero. And notice a few details here. The zero is quoted because it's a string. Almost everything in XML is a string, even it looks like a number, because an XML file is a text file and the equal sign is a single equal sign, it's not equals equals as it would be in most programming languages. Also realize you can have multiple predicates. You could have had a predicate here. You could have one over here and so forth. You just think of these location paths ultimately as these steps that are either adding or subtracting nodes from the answer and what's returned in PHP is an array of the final results. Phew. That was a lot, too. Any questions? Alright. So it's time for pizza ML discussion. So in pizza ML, as you'll see in the spec, there's going to be a few components to it. Let me go ahead and open this up in a browser and the specification for this one, like other, that's what the place used to look before they shut down, sadly. It's delicious pizza. What you'll see is that the specs will have you walk through a few steps typically. The boxes at left at meant to be check boxes so that if you print it out you can at least physically have the gratification of saying done, done, done, done, as you walk through the spec. There'll be some recommended reading here that sometimes will teach new stuff, sometimes will recap old stuff and realize there's plenty of free stuff available. The course's syllabus has some recommended books. None of them are required, so really they're there for your own edification if of interest. Plenty of free content exists on the web. You'll next see in the spec a number of instructions for how to download and install the CS50 appliance. In particular, if you're a MAC user, there is one step you can do tonight. When you go to the URL that's cited in this section of the spec, it has instructions for how to install the appliance. To be clear, the appliance will be posted by morning, but you can download sooner than that VMWare itself. If you're a MAC user, you need VMWare fusion. Unfortunately, the only offer a 30-day trial, but we can give you access for longer than 30 days, so go ahead and follow the instructions on the page there and we'll explain how to get the access to the site license that we have. Windows and Linux users have it easier in that they actually have a totally free version called VMWare player which does the same thing, but it's for Windows and Linux, and the instructions are also there. So you can download and install the hyper visor tonight or tomorrow. The fusion folks though will have to wait to hear back from us as to the licensing issue. And the appliance itself will be up by morning. And now we'll have a little more tutorial here on setting up a vhost, so the discussion we had earlier about creating the vhost directory and so forth, all of that is documented here so that you know how to create a project zero vhost. Bitbucket, there's instructions for how to get yourself set up for bitbucket for your own benefit so that you don't run the risk of losing content accidentally. And then the real guts of the spec, delicious XML walks you through the menu itself. So, I'm going to section and to Peter's walk through of the same, but you'll see here an illusion to a whole bunch of categories of food which are all present here in the file. So let me pull up here the menu for three aces. So this is literally the, a PDF of the menu that they used to have and you'll see that they have pizzas and salads and wraps and so forth. But one of these things that's kind of curious is that if you go into the store, you pick up this menu, it's actually pretty obvious to human how to order a pizza or a salad or the like. But if you're the unfortunate programmer who has to model this same set of data in a database, or in our case in a simple database like an XML file, you'll realize that these pizza guys really were not being semantically consistent throughout their menu. Which is to say, even though the left hand side looks like a list of pizzas all of which are conceptually the same, just pizzas with different options, if you scroll down further-- let me zoom in here, and Chrome is making this PDF a little weird. Let's open it in preview instead. Let's scroll down to the menu. So you'll notice that you have pizza with cheese, with onions, with peppers, with broccoli, dah, dah, dah, dah, dah, and then at the bottom, extra cheese. So this is one of these corner cases where this does not belong semantically in the same column of information. Why? Because no one is going to order extra cheese for $1.25 without a pizza, right? So there's this corner case here that you need to sort of think about, well how do you tolerate something like extra cheese? Is it a check box, a radio button, is it you type it in the notes. So this is one of those corner cases to run into. Here, too, we have pizza was the header for this section here and it comes also in small and large. Now notice, these are the kinds of things, at least if you're me you really notice, they capitalize small and large differently here. Down here, spaghetti or ziti is a category of itself and then the descriptions of the line items are with this, with that, or you can have this crazy thing which isn't really quite the same. We've got homemade lasagna. So it's things like this. If the category now in your website is called homemade lasagna, ravioli, or manicotti, that's not really just category, right? That's like at least a radio button between three different types of pasta. So again, what looks pretty straight forward to humans, once you start thinking about as a developer and as a designer, it's not obvious how you would represent this information. So let's take a quick stab at modeling this data just to get you thinking along the right lines, and then again with Peter in Section, he'll have an opportunity to discuss other design opportunities, or watch online after the fact, as well as in office hours tonight with Chris and later on Wednesday. So I need to represent, create a file, as you'll see in the spec called menu.xml. Now the design of that file is totally up to you, so long as it adheres to certain bulleted requirements in the spec. So I need minimally like a root element. It could be called menu, it could be called anything you want. Probably shouldn't be called pizzas, because as we've seen, they sell things besides pizzas. So let's just take a group stab for just a moment here, what could be one of the first children of a menu element. Category. Okay. So maybe a category, so let's do that. Open bracket category, and now one of the categories, recall, let's go back to it, was pizzas, salads, grinders, then the weird one, homemade lasagna, ravioli, or manicotti. So what more needs to be in this category element? Do I want to close my angle bracket now? Do I want an attribute? Do I want to move on to children? Again, no right answer here. But let's just brainstorm. [Inaudible] An attribute, okay. So maybe name equals pizzas. Maybe. This; not unreasonable. So close category. And then in here, what goes inside of a category? Well let me look back at the PDF. If it's pizzas, tomato and cheese. Alright. So, do I want to say this, though? Maybe size equals small, right? So this is where it gets non-obvious. Right, the XML language is super simple. A children attributes, that's it. You don't have much expressive capabilities here. But it's definitely non-obvious how to do this. Now give me an argument against having a pizza element here. Why might this be a bad idea? If you have multiple categories with pizzas and salads, you would want to be consistent within those categories, so call it like item [inaudible]. Good. Good. I mean if we're going through the trouble of sort of generalizing the notion of categories as we just did and then refining the definition of a category with a name attribute that is category specific, why are we then hard-coding pizza which only belongs up there. So we've sort of generalized the category but then hard-coded the specifics. So maybe better is indeed, let's be self-consistent and let's maybe say item name equals tomato and cheese and then I have to somehow embody the notion of size for instance and I don't really know how to do that, but I probably don't want to do open bracket large, right? I probably want to do something like size and then maybe type equals large, or something. Again, no right answer. But the fact that I'm kind of pausing and erasing and starting over, it should be the same kind of thought process you probably engage in and already this is broken. What wonderful surprise that this menu gave us an example here. [ Inaudible ] Yeah. So, already we have that stupid issue. And this is necessary. If you have this, the simple XML read file function will fail. You'll get an error because it's not valid XML. So I would also make this suggestion. Before you spend an hour or more sort of making your menu, do it in baby steps, right. Write a little bit of code, run it through the equivalent of a program like I just ran you involving the lectures but using your menu, just to make sure that with baby steps you're not creating of a monster of a mess to then go back and fix. So also realize, too, in the spec this could become incredibly tedious to have to type out this entire menu. So realize that we specify in the spec, you only need to do a few from each of the categories. So this won't be an exercise in tedium, but you'll have to trip over some of these design decisions and realize, too, because we specify in the spec that you only need to three or more items from each of the categories, you can also in fairness avoid certain corner cases like the extra cheese thing if you want. That's really up to you as to how you navigate those waters. There's plenty of things to trip over, though in the aggregate with this whole menu. But let's do one other example now, unrelated to menus, but related to that example involving RSS. So that was our lecture example. Let me go ahead and open RSS1.php, but let me first highlight this URL; let me first highlight this URL here; my browser's not cooperating. Do it this way. Okay. There we go. So first, let me copy that URL; let me then visit it in my browser. And here's the problem with RSS, at least from a developer. This is not the RSS feed. This is the browser's rendition of an RSS feed in a more user-friendly way with stuff that's not in the actual RSS file. In fact, if I view the source of this, odds are depending on the browser, I'll see HTML, and I won't see RSS. So there are ways around this. Let me actually open up my terminal instead. And what I'm going to do is this. I'm going to type W get for web get, then I'm going to paste in this URL and then hit enter. What's nice about the W get command, which exists on a lot of platforms like Linux, is now I have a file called technology, and here is the RSS file. Now the specifics of this RSS file are not all that interesting for us tonight. Realize that they do adhere to the pub date and title and link and description, all those place holders I promised existed in RSS file, there in this one as well, in addition to a bunch of other stuff. Adam is a alternative file format and there's name space support in this thing. But in short, at the end of the day, this is an RSS feed and it has channel and item, and item, and item, and item, and item. Suppose I wanted to implement my own RSS reader. Well think back to the format that this thing, that an RSS feed has, which again looks like this. And knowing just this, I should be able to write this RSS reader pretty easily. Let me go to my RSS1.php for a first pass at this. Here as I claim a super simple RSS reader. This time, I call simple XML load file and you can actually in PHP pass URL's generally a file is expected. This has to be enabled on a system but it usually is by default. But if it is a URL instead of a file, the function, simple XML load file will use TCPIP and HTTP to go get the content and then treat it as though it were a file. Now here's my RSS reader. For each of the item elements that's a child of channel, that's a child of the dom itself that was returned by parsing that XML file at that URL go ahead and print out an LI tag followed by an anchor tag with this H ref. Now here's slightly new syntax, but you can probably infer what's going on. PHP supports variable interpolation, which means if you have double quotes and you put a PHP variable inside of them, that's variable's value will be put there. By contrast, if you use single quotes, that will not happen. However, when you have weird syntax like arrows, so it's not just dollar sign item, you want dollar sign item, arrow, link, you need to make clearer to PHP what should be interpolated. For that you use curly braces. So notice I'm using double quotes there and there, but because this is kind of a funky looking variable with the arrow notation and what not, I need to put curly braces around it and those won't show up in the output, but they will tell PHP interpolate this. And to interpolate again means replace this variable with its actual value. Now the fact that there are single quotes there is just a coincidence. That does not mean the variable won't be interpolated. The quotes that matter are the outer most ones and in this case they're double quotes. So in PHP using double quotes when you want to put variable inside is necessary. Alright, after I've printed out that anchor tag with an [inaudible] attribute, I just print out the title, then I print out the anchor tag, then the close LI tag. And then the rest of the file is close body, close HTML, and now if I pull up RSS1.php, which I can do here, RSS1.php, we didn't look at them, but there is a super simple RSS reader for today's literally, New York Times feed whereby I just visited RSS1.php on my own appliance which is running Apache. Apache has PHP support installed which means there's a PHP interpreter there. The web server realizes you want RSS1.php and says, oh, let me pass this file to PHP's interpreter on the local computer. The interpreter reads my file RSS1.php, top to bottom, left to right. As soon as it hits the line simple XML load file and sees that URL, my appliance makes an HTP request to the New York Times.com, gets back the XML, it gets parsed by that function, then my loop happens. So all of this happens behind the scenes in what like six lines of actual code and I have my own RSS reader here spitting out the titles and the links. So it can be as simple as that. So what's the point of this lecture example? What's the point of this RSS example? Really to give you the basic building blocks with which to create and traverse your own menu.xml file. But to do that ultimately, that's just one piece of the puzzle. You'll need to have some notion of support for sessions, right. Because if the user wants to not only see the menu, but click links, or fill out forms, adding things to his or her cart, you're going to want to remember what pizzas, what salads they've added to their cart, so what super global is appropriate there? [ Inaudible ] Yeah. Dollar sign, underscore, session. So you'll need to make use of that. You don't need to use, have logins, so the spec does not require user names and passwords or anything like that. You don't actually need to send the order to a company, the store, you just need to confirm for the user, you could fake for the user that their order has ultimately been submitted. But you're going to have to maintain some notion of a shopping cart and enable the user to add and remove things from that cart. So ultimately, you'll probably have a few different pages unless you use one main controller and have multiple views that the controller uses. But again this is the key aspect of the design component for the project. There is no one right answer here. You're welcome to bounce ideas off of us on the discussion board who's URL is in the spec and also on the course's home page. But there's a lot of pieces now from lecture zero, one, two, and now three that you can hopefully start to wire together in the interest of making your own pizza ordering website and along the way hopefully you will trip over a non-trivial number of annoying data issues which are going to be representative of a class of problems that are out there in the real world when it comes to just making stuff work for companies like this one. Any questions on XML, pizza ML, or the like? Yeah. Is there any difference between using XML and PHP for [inaudible]? Good question. Is there any difference between using PHP and XML for the menu? Really, it's a separation of data and logic. In theory, you could represent the menu somehow with a whole bunch of variables in PHP, or even a massive PHP associative array, but that's not very easy to maintain and it also doesn't allow you to semantically tag information as much as you can with XML, which again has child support, which has attribute support, and the like, and the short of it is that it's just wrong to model data in that way in PHP since the data should transcend the particular choice of language. So another argument in favor of the XML file is now it's much more portable, could be read by different languages and the like, and so it's just a cleaner way of doing it, rather than hardcoding things for PHP. And moreover what it'll allow us to do longer term is once we introduce an actual database, like MySQL, then you can leave most of your PHP code the same and just change a few lines that relate to the XML file and plugin support for MySQL instead of XML and everything just keeps working. So this separation of data from logic allows you to sort of refine and then build upon it longer term much more cleanly. Good question. Any others? No. Alright. So that was a lot. Why don't we call it a night here? I'll stick around for one-on-one questions. Peter's going to set up and then dive into the walk through of the spec itself. Otherwise, I'll see you on Wednesday.