Thứ Ba, 11 tháng 2, 2014

Tài liệu Java and XML Java and XML docx

Modifying the Data 282
XML from Scratch 287
The Real World 288
What’s Next? 295
Chapter 13. Business-to-Business 295
The Foobar Public Library 296
mytechbooks.com 304
Push Versus Pull 311
The Real World 322
What’s Next? 322
Chapter 14. XML Schema 323
To DTD or Not To DTD 323
Java Parallels 325
What’s Next? 332
Appendix A. API Reference 332
A.1 SAX 2.0 332
A.2 DOM Level 2 343
A.3 JAXP 1.0 349
A.4 JDOM 1.0 351
Appendix B. SAX 2.0 Features and Properties 358
B.1 Core Features 358
B.2 Core Properties 360
Java and XML

page 5
Preface
XML, XML, XML, XML. You can see it on hats and t-shirts, read about it on the cover of every
technical magazine on the planet, and hear it on the radio or the occasional Gregorian chant album. .
. . Well, maybe it hasn't gone quite that far yet, but don't be surprised if it does. XML, the
Extensible Markup Language, has seemed to take over every aspect of technical life, particularly in
the Java™ community. An application is no longer considered an enterprise-level product if XML
isn't being used somewhere. Legacy systems are being accessed at a rate never before seen, and
companies are saving millions and even billions of dollars on system integration, all because of
three little letters. Java developers wake up with fever sweats wondering how they are going to
absorb yet another technology, and the task seems even more daunting when embarked upon; the
road to XML mastery is lined with acronyms: XML, XSL, XPath, RDF, XML Schema, DTD, PI,
XSLT, XSP, JAXP™, SAX, DOM, and more. And there isn't a development manager in the world
who doesn't want his or her team learning about XML today!
When XML became a formal specification at the World Wide Web Consortium in early 1998,
relatively few were running in the streets claiming that the biggest thing since Java itself (arguably
bigger!) had just made its way onto the technology stage. Barely two years later, XML and a
barrage of related technologies for manipulating and constraining XML have become the mainstay
of data representation for Java systems. XML promises to bring to a data format what Java brought
to a programming language: complete portability. In fact, it is only with XML that the promise of
Java is realized; Java's portability has been seriously compromised as proprietary data formats have
been used for years, enabling an application to run on multiple platforms, but not across businesses
in a standardized way. XML promises to fill this gap in complete interoperability for Java programs
by removing these proprietary data formats and allowing systems to communicate using a standard
means of data representation.
This is a book about XML, but it is geared specifically towards Java developers. While both XML
and Java are powerful tools in their own right, it is their marriage that this book is concerned with,
and that gives XML its true power. We will cover the various XML vocabularies, look at creating,
constraining, and transforming XML, and examine all of the APIs for handling XML from Java
code. Additionally, we cover the hot topics that have made XML such a popular solution for
dynamic content, messaging, e-business, and data stores. Through it all, we take a very narrow
view: that of the developer who has to put these tools to work. A candid look at the tools XML
provides is given, and if something is not useful (even if it is popular!), we will address it and move
on. If a particular facet of XML is a hidden gem, we will extract the value of the item and put it to
use. Java and XML is meant to serve as a handbook to help you, and is neither a reference nor a
book geared towards marketing XML.
Finally, the back half of this book is filled with working, practical code. Although available for
download, the purpose of this code is to walk you through creating several XML applications, and
you are encouraged to follow along with the examples rather than skimming the code. We introduce
a new API for manipulating XML from Java as well, and complete coverage and examples are
included. This book is for you, the Java developer, and it is about the real world; it is not a
theoretical or fanciful flight through what is "cool" in the industry. We abandon buzzwords when
possible, and define them clearly when not. All of the code and concepts within this book have been
entered by hand into an editor, prodded and tested, and are intended to aid you on the path to
mastering Java and XML.
Java and XML

page
6
Organization
This book is structured in a very particular way: the first half of the book (Chapter 1 through
Chapter 7) focuses on getting you grounded in XML and the core Java APIs for handling XML.
Although these chapters are not glamorous, they should be read in order, and at least skimmed even
if you are familiar with XML. We cover the basics, from creating XML to transforming it. Chapter
8 serves as a halfway point in the book, covering an exciting new API for handling XML within
Java, JDOM. This chapter is a must-read, as the API is being publicly released as this book goes to
production, and this is the reference for JDOM 1.0 (as I wrote the API with Jason Hunter
specifically for solving problems in using Java and XML!). The remainder of the book, Chapter 9
through Chapter 14, focuses on specific XML topics that continually are brought up at conferences
and tutorials I am involved with, and seeks to get you neck-deep in using XML in your applications,
now! Finally, there are two appendixes to wrap up the book. Here's a summary of the contents:
Chapter 1
We look at what all the hype is about, examine the XML alphabet soup, and spend time
discussing why XML is so important to the present and future of enterprise development.
Chapter 2
We start looking at XML by building an XML document from the ground up. Examination
of the major XML constructs, such as elements, attributes, entities, and processing
instructions is included.
Chapter 3
The Simple API for XML (SAX), our first Java API for handling XML, is introduced and
covered in this chapter. The parsing lifecycle is detailed, and the events that can be reported
by SAX and used by developers are demonstrated.
Chapter 4
In this chapter, we look at the two ways to impose constraints on XML documents:
Document Type Definitions (DTDs) and XML Schema. We will dissect the differences and
analyze when one should be used over the other.
Chapter 5
Complementing Chapter 4, this chapter looks at how to use the SAX skills previously
learned to enforce validation constraints, as well as how to react when constraints are not
met by XML documents.
Chapter 6
In this chapter, the Extensible Stylesheet Language (XSL) and the other critical components
for transforming XML from one format into another are introduced. We cover the various
methods available for converting XML into other textual formats, and look at using
formatting objects to convert XML into binary formats.
Chapter 7
Java and XML

page
7
Continuing to look at transforming XML documents, we discuss XSL transformation
processors and how they can be used to convert XML into other formats. We also examine
the Document Object Model (DOM) and how it can be used for handling XML data.
Chapter 8
We begin by looking at the Java API for XML Parsing ( JAXP), and discuss the importance
of vendor-independence when using XML. I then introduce the JDOM API, discuss the
motivation behind its development, and detail its use, comparing it to SAX and DOM.
Chapter 9
This chapter looks at what a web publishing framework is, why it matters to you, and how to
choose a good one. We then cover the Apache Cocoon framework, taking an in-depth look
at its feature set and how it can be used to serve highly dynamic content over the Web.
Chapter 10
In this chapter, we cover Remote Procedure Calls (RPC), their relevance in distributed
computing as compared to RMI, and how XML makes RPC a viable solution for some
problems. We then look at using XML-RPC Java libraries and building XML-RPC clients
and servers.
Chapter 11
In this chapter, we look at using configuration data in an XML format and why that format
is so important to cross-platform applications, particularly as it relates to distributed
systems.
Chapter 12
Although this topic is covered in part in other chapters, here we look at the process of
generating and mutating XML from Java and how to perform these modifications from
server-side components such as Java servlets, and outline concerns when mutating XML.
Chapter 13
This chapter details a "case study" of creating inter- and intra-business communication
channels using XML as a portable data format. Using multiple languages, we build several
application components for different companies that all interact with each other using XML.
Chapter 14
We revisit XML Schema here, looking at why the XML Schema specification has garnered
so much attention and how reality measures up to the promise of the XML Schema concept,
and examining why Java and XML Schema are such complementary technologies.
Appendix A
This appendix details all the classes, interfaces, and methods available for use in the SAX,
DOM, JAXP, and JDOM APIs.
Java and XML

page 8
Appendix B
This appendix details the features and properties available to SAX 2.0 parser
implementations.
Who Should Read This Book?
This entire book is based on the premise that XML is quickly becoming an essential part of Java
programming. The chapters are written to instruct you in the use of XML and Java, and other than
in the introduction, they do not focus on if you should use XML. I believe that if you are a Java
developer, you should use XML, without question. For this reason, if you are a Java programmer,
want to be a Java programmer, manage Java programmers, or are responsible for or associated with
a Java project, this book is for you. If you want to advance, want to become a better developer, want
to write cleaner code, want to have projects succeed on time and under budget, need to access
legacy data, need to distribute system components, or just want to know what the XML hype is
about, this book is for you.
I tried to make as few assumptions about you as possible; I don't believe in setting the entry point
for XML so high that it is impossible to get started. However, I also believe that if you spent your
money on this book, you want more than the basics. For this reason, I assumed only that you know
the Java language and understand some server-side programming concepts (such as Java servlets
and Enterprise JavaBeans™). If you have never coded Java before or are just getting started with
the language, you may want to read through Learning Java, by Pat Niemeyer and Jonathan
Knudsen (O'Reilly & Associates), before starting this book. I do not assume that you know anything
about XML, and so I start with the basics. However, I do assume that you are willing to work hard
and learn quickly; for this reason, we move rapidly through the basics so that the bulk of the book
can deal with advanced concepts. Material is not repeated unless appropriate, so you may need to
re-read previous sections or be prepared to flip back and forth, as previously covered concepts are
used in later chapters. If you want to learn XML, know some Java, and are prepared to enter some
example code into your favorite editor, you should be able to get through this book without any real
problem.
Software and Versions
This book covers XML 1.0 and the various XML vocabularies in their latest form as of April 2000.
Because various XML specifications that are covered are not final, minor inconsistencies may be
present between printed publications of this book and the current version of the specification in
question.
All of the Java code used is based on the Java 1.1 platform, with the exception of the JDOM 1.0
coverage. This variance with regard to JDOM is noted in the text in Chapter 8
, and addressed there.
The Apache Xerces parser, Apache Xalan processor, and Apache FOP libraries were the latest
stable versions available as of April 2000, and the Apache Cocoon web publishing framework used
was Version 1.7.3. The XML-RPC Java libraries used were Version 1.0 beta 3. All software used is
freely available and can be obtained online from http://java.sun.com, http://xml.apache.org, and
http://www.xml-rpc.com.
The source code for the examples in this book, including the com.oreilly.xml utility classes, is
contained completely within the book itself. Both source and binary forms of all examples
(including extensive Javadoc not necessarily included in the text) are available online from
http://www.oreilly.com/catalog/javaxml and http://www.newInstance.com. All of the examples that
Java and XML

page 9
could run as servlets, or be converted to run as servlets, can be viewed and used online at
http://www.newInstance.com.
The complete JDOM 1.0 distribution, including the specification, reference implementation, source
code, API documentation, and binary release, is available for download online at
http://www.jdom.org. Additionally, a CVS tree is being set up to host the JDOM code and allow
community contribution and comment. See http://www.jdom.org for details on accessing JDOM
from CVS.
Conventions Used in This Book
I use the following font conventions in this book.
Italic is used for:
• Unix pathnames, filenames, and program names
• Internet addresses, such as domain names and URLs
• New terms where they are defined
Constant Width is used for:
• Command lines and options that should be typed verbatim
• Names and keywords in Java programs, including method names, variable names, and class
names
• XML element names and tags, attribute names, and other XML constructs that appear as
they would within an XML document
Constant Width Bold
is used for:
• Additions to code examples
• Parts of code examples that are discussed specifically in the text
Comments and Questions
Please address comments and questions concerning this book to the publisher:
O'Reilly & Associates, Inc.
101 Morris Street
Sebastopol, CA 95472
(800) 998-9938 (in the U.S. or Canada)
(707) 829-0515 (international or local)
(707) 829-0104 (fax)
You can also send us messages electronically. To be put on our mailing list or to request a catalog,
send email to:
info@oreilly.com

To ask technical questions or comment on the book, send email to:
bookquestions@oreilly.com
Java and XML

page 10
We have a web site for the book, where we'll list errata and any plans for future editions. You can
access this page at:
http://www.oreilly.com/catalog/javaxml
For more information about this book and others, see the O'Reilly web site at:
http://www.oreilly.com

Acknowledgments
As I look at the stack of pages that comprise the manuscript of this book, it seems absurd to try and
thank all the people involved in making this book in only a few paragraphs. However, as this is
arguably simpler than covering the entire realm of Java and XML in just under 500 pages, I am
certainly willing to attempt it; for those of you I forget, please forgive me in advance!
This book was initiated by a call on Thanksgiving weekend, 1999, from my editor, Mike Loukides,
which came as I was feverishly writing another book for O'Reilly. I was a bit dubious about putting
a book I was very passionate about on hold for six months, but Mike was as adept at convincing me
of the importance of this book as he has been at editing my words and making them useful. As I
look back, this was easily the most enjoyable and exciting thing I have ever done in my technical
career, and I owe much of that experience to Mike; he guided me through a very difficult first few
chapters, allowed me to vent when I had to revise the XML Schema chapter three (yes, three!) times
due to revisions of the specification coming out, and was also an all-around musical guy when I
needed to take a break. Without him, this would certainly not be the high-quality book we both
believe it is.
Additionally, I had a supporting cast of family and friends that made the amount of time and effort
needed to make this book happen possible, and even enjoyable. My mom and dad, who corrected
my grammar daily for eighteen years of my life; my aunt, who was always excited for me even
when she didn't know what I was talking about; Jody Durrett, Carl Henry, and Pam Merryman, who
spent more time making me a good writer than I had any right to expect; Gary and Shirley
Greathouse, who always reminded me to never settle; and my grandparents, Dean and Gladys
McLaughlin, who were always there in the wings supporting me.
I had an incredible group of technical reviewers, who made this book both accurate and relevant:
Marc Loy, Don Weiss, George Reese (who managed to get an entire chapter added in response to
his comments!), Matthew Merlo, and James Duncan Davidson. James in particular was helpful, as
his willingness to correct minor errors and be brutally honest with me was instrumental in
reminding me that I am a developer before I am a writer.
I also owe an incredible debt of gratitude to Jason Hunter, author of Java Servlet Programming
(O'Reilly & Associates). This book, though started in November of 1999, experienced a rebirth in
March of 2000 as Jason and I spent an entire afternoon sitting on a lawn in Santa Clara griping
about the current Java API offerings for XML. The result of this discussion was twofold: first, we
developed the JDOM API, covered in this book (with help and encouragement from James
Davidson at Sun Microsystems). We believe that this API will be instrumental in bringing Java and
XML more in line with each other, as well as keeping the focus of using XML on the Java
programming language and usability, rather than on vague concepts and obscurity. Second, Jason
has become an invaluable friend, and has helped me through the often confusing process of
completing a book and being an O'Reilly author. We spent entirely too many evenings talking for
Java and XML

page 11
hours into the night across the country about how to make JDOM and other code samples work in
an intuitive way.
Most importantly, I owe everything in these pages to my wife, Leigh. Miraculously, she has
managed to not kick me out of the house over the last six months, as I have been tired, inaccessible,
and extremely busy almost constantly. The few moments I had with her away from writing and my
full-time consulting job have been what made everything worthwhile. I have missed her terribly,
and am anxious to return to spending time with her, my three basset hounds (Charlie, Molly, and
Daisy), and my labs (Seth and Moses).
And to my grandfather, Robert Earl Burden, who didn't get to see this, you are everything that I
have ever wanted to be; thanks for teaching me that other people's expectations were always lower
than I should be satisfied with.
Chapter 1. Introduction
XML. These three letters have brought shivers to almost every developer in the world today at some
point in the last two years. While those shivers were often fear at another acronym to memorize,
excitement at the promise of a new technology, or annoyance at another source of confusion for
today's developer, they were shivers all the same. Surprisingly, almost every type of response was
well merited with regard to XML. It is another acronym to memorize, and in fact brings with it a
dizzying array of companions: XSL, XSLT, PI, DTD, XHTML, and more. It also brings with it a
huge promise: what Java did for portability of code, XML claims to do for portability of data. Sun
has even been touting the rather ambitious slogan "Java + XML = Portable Code + Portable Data"
in recent months. And yes, XML does bring with it a significant amount of confusion. We will seek
to unravel and demystify XML, without being so abstract and general as to be useless, and without
diving in so deeply that this becomes just another droll specification to wade through. This is a
book for you, the Java developer, who wants to understand the hype and use the tools that XML
brings to the table.
Today's web application now faces a wealth of problems that were not even considered ten years
ago. Systems that are distributed across thousands of miles must perform quickly and flawlessly.
Data from heterogeneous systems, databases, directory services, and applications must be
transferred without a single decimal place being lost. Applications must be able to communicate not
only with other business components, but other business systems altogether, often across companies
as well as technologies. Clients are no longer limited to thick clients, but can be web browsers that
support HTML, mobile phones that support the Wireless Application Protocol (WAP), or handheld
organizers with entirely different markup languages. Data, and the transformation of that data, has
become the crucial centerpiece of every application being developed today.
XML offers a way for programmers to meet all of these requirements. In addition, Java developers
have an arsenal of APIs that enable them to use XML and its many companions without ever
leaving a Java Integrated Development Environment (IDE). If this sounds a little too good to be
true, keep reading. You will walk through the pitfalls of the various Java APIs as well as look at
some of the bleeding-edge developments in the XML specification and the Java APIs for XML.
Through it all, we will take a developer's view. This is not a book about why you should use XML,
but rather how you should use it. If there are offerings in the specification that are not of much use,
details of why will be clearly given and we will move on; if something is of great value, we'll spend
some extra time on it. Throughout, we will focus on using XML as a tool, not using it as a
buzzword or for the sake of having the latest toy. With that in mind, let's begin to talk about what
XML is.
Java and XML

page 12
1.1 What Is It?
XML is the Extensible Markup Language . Like its predecessor SGML, XML is a meta-language
used to define other languages. However, XML is much simpler and more straightforward than
SGML. XML is a markup language that specifies neither the tag set nor the grammar for that
language. The tag set for a markup language defines the markup tags that have meaning to a
language parser. For example, HTML has a strict set of tags that are allowed. You may use the tag
<TABLE> but not the tag <CHAIR>. While the first tag has a specific meaning to an application using
the data, and is used to signify the start of a table in HTML, the second tag has no specific meaning,
and although most browsers will ignore it, unexpected things can happen when it appears. That is
because when HTML was defined, the tag set of the language was defined with it. With each new
version of HTML, new tags are defined. However, if a tag is not defined, it may not be used as part
of the markup language without generating an error when the document is parsed. The grammar of
a markup language defines the correct use of the language's tags. Again, let's use HTML as an
example. When using the <TABLE> tag, several attributes may be included, such as the width, the
background color, and the alignment. However, you cannot define the
TYPE of the table because the
grammar of HTML does not allow it.
XML, by defining neither the tags nor the grammar, is completely extensible; thus its name. If you
choose to use the tag
<TABLE> and then nest within that tag several <CHAIR> tags, you may do so. If
you wish to define a TYPE attribute for the <CHAIR> tag, you may do that also. You could even use
tags named after your children or co-workers if you so desired! To demonstrate, let's take a look at
the XML file shown in Example 1.1.
Example 1.1. A Sample XML File
<?xml version="1.0"?>

<dining-room>
<table type="round" wood="maple">
<manufacturer>The Wood Shop</manufacturer>
<price>$1999.99</price>
</table>

<chair wood="maple">
<quantity>2</quantity>
<quality>excellent</quality>
<cushion included="true">
<color>blue</color>
</cushion>
</chair>

<chair wood="oak">
<quantity>3</quantity>
<quality>average</quality>
</chair>
</dining-room>
If you have never looked at an XML file, but are familiar with HTML or another markup language,
this may look a bit strange to you. That's because the tags and grammar being used are completely
made up. No web page or specification defines the <table>, <chair>, or <cushion> tags (although
one could, just as the XHTML specification defines HTML tags in XML); they are completely
concocted. This is the power of XML: it allows you to define the content of your data in a variety of
ways as long as you conform to the general structure that XML requires. Later we will go into detail
on some additional constraints, but for now it is sufficient to realize that XML is built to allow
flexibility of data formatting.
Java and XML

page 13
Although this flexibility is one of XML's strongest points, it also creates one of its greatest
weaknesses: because XML documents can be processed in so many different ways and for so many
different purposes, there are a large number of XML-related standards to handle translation and
specification of data. These additional acronyms, and their constant pairing with XML itself, often
confuse what XML is and what it is not. More often than not, when you hear "XML," the speaker is
not referring specifically to the Extensible Markup Language, but to all or part of the suite of XML
tools. Although sometimes these will be referred to separately, be aware that "XML" does not just
mean XML; more often it means "XML and all the great ways there are to manipulate and use it."
With those preliminaries out of the way, we are ready to define some of the most common XML
acronyms and give short descriptions of each. These will be fundamental to everything else in the
book, so keep this chapter marked for reference. These descriptions should start to help you
understand how the XML suite of tools fits together, what XML is, and what it isn't. Discussion of
publishing engines, applications, and tools for XML is avoided; these are discussed later when we
talk about specific XML topics. Rather, this section only refers to specifications and
recommendations in various stages of consideration. Most of these are initiatives of the W3C, the
World Wide Web Consortium. This group defines standards for the XML community that help
provide a common base of knowledge for this technology, much as Sun provides standards for Java
and related APIs. For more on the W3C, visit http://www.w3.org on the Web.
1.1.1 XML
XML, of course, is the root of all these three- and four-letter acronyms. It defines the core language
itself and provides a metadata-type framework. XML by itself is of limited value; it defines only
that framework. However, all of the various technologies that rest upon XML provide developers
and content managers unprecedented flexibility in data management and transmission. XML is
currently a completed W3C Recommendation, meaning it is final and will not change until another
version is released. For the complete XML 1.0 Specification, see http://www.w3.org/TR/REC-xml/.
As this specification is tough to read through for even the XML-savvy, an excellent annotated
version of the specification is available at http://www.xml.com.
As we will spend lots of time going into detail on this subject in future chapters, there are only two
basic concepts you need to understand about XML documents right now. The first is that any XML
document must be well-formed to be of any use and to be parsed correctly. A well-formed
document is one that has every tag closed that is opened, has no tags nested out of order, and is
syntactically correct in regard to the specification. You may be wondering: didn't we say that XML
has no syntax rules? Not exactly; we said that it did not have any grammatical rules. While the
document can define its own tags and attributes, it still must conform to a general set of principles.
These principles are then used by XML-aware applications and parsers to make sense of the
document and perform some action with the data, such as finding the price of a chair or creating a
PDF file from the data within a document. We will discuss these details in greater depth in Chapter
2.
The second basic concept concerning XML documents is that they can be, but are not required to
be, valid. A valid document is one that conforms to its document type definition (DTD), which we'll
talk about in a moment. Simply put, a DTD defines the grammar and tag set for a specific XML
formatting. If a document specifies a DTD and follows that DTD's rules, it is said to be a valid
XML document. XML documents can also be constrained by a schema, a new way of dictating
XML format that will replace DTDs. When a document conforms to a schema, it can be said to be
schema valid. Don't worry if this isn't all clear yet; we have a long way to go, and we will look at
each of these XML-related specifications. First, though, there are some acronyms and specifications
that are used within an XML document. Let's take a look at these now.

Không có nhận xét nào:

Đăng nhận xét