ebook img

5. STRINGS - Algorithms, 4th Edition by Robert Sedgewick and Kevin PDF

66 Pages·2014·4.65 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview 5. STRINGS - Algorithms, 4th Edition by Robert Sedgewick and Kevin

Algorithms R S | K W OBERT EDGEWICK EVIN AYNE 5.1 S S TRING ORTS ‣ strings in Java ‣ key-indexed counting ‣ LSD radix sort Algorithms F O U R T H E D I T I O N ‣ MSD radix sort ‣ 3-way radix quicksort R S | K W OBERT EDGEWICK EVIN AYNE ‣ suffix arrays http://algs4.cs.princeton.edu 5.1 S S TRING ORTS ‣ strings in Java ‣ key-indexed counting ‣ LSD radix sort Algorithms ‣ MSD radix sort ‣ 3-way radix quicksort R S | K W OBERT EDGEWICK EVIN AYNE ‣ suffix arrays http://algs4.cs.princeton.edu String processing String. Sequence of characters. Important fundamental abstraction. ・ Genomic sequences. ・ Information processing. ・ Communication systems (e.g., email). ・ Programming systems (e.g., Java programs). ・ … “ The digital information that underlies biochemistry, cell biology, and development can be represented by a simple string of G's, A's, T's and C's. This string is the root data 0 structure of an organism's biology. ” — M. V. Olson 3 The char data type C char data type. Typically an 8-bit integer. ・ Supports 7-bit ASCII. 6.5 (cid:81) Data Compression 667 ・ Can represent at most 256 characters. ASCII encoding. When you HexDump a bit- 0 1 2 3 4 5 6 7 8 9 A B C D E F stream that contains ASCII-encoded charac- 0 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI ters, the table at right is useful for reference. 1 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US Given a 2-digit hex number, use the first hex 2 SP ! “ # $ % & ‘ ( ) * + , - . / digit as a row index and the second hex digit 3 0 1 2 3 4 5 6 7 8 9 : ; < = > ? as a column reference to find the character 4 @ A B C D E F G H I J K L M N O that it encodes. For example, 31 encodes the 5 P Q R S T U V W X Y Z [ \ ] ^ _ digit 1, 4A encodes the letter J, and so forth. U+0041 U+00E1 U+2202 U+1D50A 6 ` a b c d e f g h i j k l m n o This table is for 7-bit ASCII, so the first hex 7 p q r s t u v w x y z { | } ~ DEL digit must be 7 or less. Hex numbers starting some Unicode characters Hexadecimal to ASCII conversion table with 0 and 1 (and the numbers 20 and 7F) correspond to non-printing control charac- ters. Many of the control characters are left over from the days when physical devices like typewriters were controlled by ASCII input; the table highlights a few that you Java char data type. A 16-bit unsigned integer. might see in dumps. For example SP is the space character, NUL is the null character, LF ・ is line-feed, and CR is carriage-return. Supports original 16-bit Unicode. ・ Supports 21-bit Unicode 3.0 (awkwardly). In summary, working with data compression requires us to reorient our thinking about standard input and standard output to include binary encoding of data. BinaryStdIn 4 and BinaryStdOut provide the methods that we need. They provide a way for you to make a clear distinction in your client programs between writing out information in- tended for file storage and data transmission (that will be read by programs) and print- ing information (that is likely to be read by humans). I ♥︎ Unicode U+0041 5 The String data type String data type in Java. Immutable sequence of characters. Length. Number of characters. Indexing. Get the ith character. Concatenation. Concatenate one string to the end of another. s.length() 0 1 2 3 4 5 6 7 8 9 10 11 12 s A T T A C K A T D A W N s.charAt(3) s.substring(7, 11) Fundamental constant-time String operations 6 The String data type: immutability Q. Why immutable? A. All the usual reasons. ・ Can use as keys in symbol table. ・ Don't need to defensively copy. ・ Ensures consistent state. ・ Supports concurrency. ・ public class FileInputStream Improves security. { private String filename; public FileInputStream(String filename) { if (!allowedToReadFile(filename)) throw new SecurityException(); this.filename = filename; } ... } attacker could bypass security if string type were mutable 7 The String data type: representation Representation (Java 7). Immutable array + cache of hash. char[] operation Java running time length s.length() 1 indexing s.charAt(i) 1 concatenation s + t M + N ⋮ ⋮ 8 String performance trap Q. How to build a long string, one character at a time? public static String reverse(String s) { String rev = ""; for (int i = s.length() - 1; i >= 0; i--) rev += s.charAt(i); quadratic time return rev; } A. Use data type (mutable array). StringBuilder char[] public static String reverse(String s) { StringBuilder rev = new StringBuilder(); for (int i = s.length() - 1; i >= 0; i--) rev.append(s.charAt(i)); linear time return rev.toString(); } 9 Comparing two strings Q. How many character compares to compare two strings of length W ? p r e f e t c h 0 1 2 3 4 5 6 7 p r e f i x e s Running time. Proportional to length of longest common prefix. ・ Proportional to W in the worst case. ・ But, often sublinear in W. 10

Description:
Sequence of characters (mutable). Underlying implementation. Resizing char[] array and length. Remark. LSD string sort: Java implementation key-indexed counting
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.